User:Tryagain/Map replication

From Navit's Wiki
< User:Tryagain
Revision as of 11:52, 28 August 2016 by Tryagain (talk | contribs) (Current state)
Jump to: navigation, search

Current state

Navit uses Binfile format as primary storage of map data. In short, binfile is a zip archive containing preprocessed openstreetmap data. The preprocessing is done with maptool which is part of navit project. We provide user with map data through the set of map servers, which are switched by the central load balancer.

Current planet.bin is above 20Gb.

Data between map servers is replicated using proprietary code, which was never released under public license. That code allows to pass only changed data between servers, preventing the need for full data set upload. The code uses MD5 hashes to ensure data integrity.

Also, proprietary code is used to deliver map data to end users in portions they require.

We have quite a few problems with the way map data delivered to end users:

  • Production map servers are unable to split data into the pieces, so it's, for example, impossible to put a large piece of map on a FAT32 formatted SD card.
  • Navit does not behave well if it has more than one binfile active, so it's not advised to geographically split the needed region and download data as a set of binfiles.
  • When part of the map is downloaded, it still contains full zipfile directory, which contains name of each tile, and it's over 100Mb large.
  • When user wants to update his map file, he has to download it anew, despite it has most of tiles unchanged.
  • There's currently no way to check integrity of partial (not whole planet.bin) map data downloaded besides crc32 checksums included in zip file, and zip file signature bytes of important records. There's currently no way to check authority of party done map data conversion.
  • Map server has to run code to extract specific regions from the map file, not just serve a .bin file.

Navit itself supports binfile provided as a multivolume zip file. This provides:

  • natural way to split a single map file into the chinks fitting requirements of target storage. Actually we have an experimental update to proprietary code allowing to download map data in multiple volumes.
  • it's possible to do differential update of map file by adding new volumes to it. We just need the way to find the changed tiles on the map server and download them.

Navit binfile map driver has (currently broken) support of map download, which works by accessing chunks binfile on the server with "Range:" http request headers and caching results on the disk. I think it's the right way to do download, as it allows to serve map data from the dumb http/ftp servers or maybe even distribute it with BitTorrent protocol.

Suggested solutions

I think we should fix and improve map download functionality in binfile.c or implement similar functionality anew.

To solve map replication and update problems both at the map server and end user sides, I suggest to include tile version information into the binfile format as zip file members timestamps:

  • Timestamps would be updated before the map file is distributed by comparing current tiles with their counterparts in the previous map file.
  • If a given tile content matches to the older one, it receives the timestamp of its counterpart. Non-matching tiles are left with timestamp of the moment current map processing was started.
  • To update the map, user has to download list of tiles related to his geographical region, and compare their timestamps to ones he already has. Then download the missing tiles into a new zip file volume.

The two above steps would also solve "multiple binfile maps" problem, as user would be able to extend his single map file to any geographical areas he wants.

To prevent 100MB zip file directory download, I suggest to place tile names, their offset relative to the start of the zip file, and timestamps, into the separate tileset index file. Tile names themselves would be stored there in an optimized binary form instead of current alphanumeric way we store them in zip directory. Binary form would require 5 bits of name length and 36 bits of name code (6 bytes) per tile instead of currently used 18-byte ASCII string.

Some thoughts on integrity/authority problems:

  • signing content on the fly is not a good option for me, as it prevents us from use of simple file serving providers, requiring to run our code on the server side.
  • authority problem could be solved by switching to https, but this does not improve integrity problem and does not scale if we switch to BitTorrent delivery.
  • we could sign each tile and the tileset index file with electronic signature. Then user at any moment can verify if he has an actual and authoritative version of map data of a given region.

I have started to implement map file comparison by moving zipfile read functions from the navit/map/binfile/binfile.c to a new file navit/zipfile.c. Thus we'll be able to reuse these functions in maptool. I plan to shortly publish my work at its current state in a separate branch.