Maps server setup tasks

From Meta, a Wikimedia project coordination wiki


Architecture and issues[edit]

Database server[edit]

All the geospatial are stored in a PostgreSQL database running PostGIS extension.

Problems to solve:

  1. Replication from OSM
  2. Do we need two PostgreSQL instances or a will a single one do (for WMF and Toolserver)?
  3. How are we going to handle massive data imports for people who need them?
  4. How are we going to handle additional metadata that Wikimedia community might want to put into "Wikimedia" OSM? Separate database? Modified OSM schema? Something else?
  5. How scalable will be one server?

Rendering machinery[edit]

Rendering is a batch, background task, however in some architectures it is being done as a demand-based rendering (a tile that is not found is rendering on-demand).

There are different renderers available:

Mapnik currently rendering works in a combination of Apache mod_tile and renderd. There may be some scalability issues as mod_tile can talk only to a single renderd instance (one machine). There are issues with web crawlers and massive database imports, since those generate load spikes in the rendering infrastructure.

As far as I can tell, Mapnik is by far the most scalable solution of the options. A single server can so far handle the full load of Openstreetmap fairly well. Osmarender in comparison needs 100s of clients to achieve the same. Mapnik is also so scalable because it renders things on the fly and thus does not need to render all those tiles that never get served before they are outdated again. If that is still not enough, there are patches that will allow renderd to become distributed across a lan further increasing scalability ( http://trac.openstreetmap.org/ticket/2005 ). It hasn't yet been merged into renderd, as it turned out that there was no need for it on OSM, but if wikipedia needs it, I can try and get it merged. --Apmon

Mapnik has its advantages, though once the "Mapnik" PostGIS database is setup, it's equally easy to render maps with other software such as Geoserver.

Tile software options:

Problems to solve:

  1. Which renderer do we support? Or do we go for all?
  2. How do we schedule rendering jobs?
  3. How do we control and contain them?
  4. How do we collect statistics and measure improvement?
    1. What statistics do you need? mod_tile and renderd come with a bunch of ways measuring performance. E.g. http://munin.openstreetmap.org/openstreetmap/tile.openstreetmap.html#Renderd shows rendering throughput of the OSM tile server. There are more stats on mod_tile, that haven't been deployed yet and it might not be too hard to add more. --Apmon

Tile serving[edit]

As fast as possible. We probably need to measure here a lot there.

Problems to solve:

  1. On-demand generation? Pre-generate all?
  2. How to spread the load? How many machines?
  3. Go for a simple web server like thttpd and/or use some cache like Varnish or Squid? Some other solution? (The guys running the NL Tile Server are using Cherokee and appear to have measured a lot.)
    1. mod_tile works quite nicely together with squid and support HTCP cache expiry for newly renderd tiles and uses some heuristics to improve expiry times and cacheability
  4. How do we collect statistics and measure improvement?

Stylesheet management[edit]

A stylesheet gives instructions to the render. Advanced users will probably want to play with new stylesheets for the maps.

Problems to solve:

  1. Internationalization - how?
  2. Should somebody need to regenerate a whole planet to test a stylesheet? Should test rendering be handled differently from production rendering?
  3. What will be the process of putting a new stylesheet in production?

Presentation to the user[edit]

  1. We need statistics on the current usage of Geohack and WMA tiles
  1. Webstats for Geohack and WMA
  1. Static embedding (priority?)
  2. Javascript "Sloppy map" implementation - needs very scalable tile serving
OpenStreetMap architecture
OpenStreetMap architecture

(SourceOpenstreetmap:Develop)

Ptolemy: production OSM database server[edit]

  • master postgres instance

Server setup[edit]

  • Partition the server
    • setup separate partition for postgres db logs
    • separate partition for database

Main OSM mirror database[edit]

  • mirror production osm main database
  • procedure (scripts) to regularly update our OSM database with new OSM changesets

Questions:

  1. what will be mirrored? (see [1])
    • the current-tables
    • the history-tables
    • the raw-tables
  2. how could this be mirrored?
    • only current can be imported from a planet.osm
  3. how often should this be updated?
  4. is access needed from Ortelius (tile server) or just from Cassini (toolserver)?
  5. will there be access from Cassini?

Mapnik database[edit]

  • mapnik rendering database (with PostGIS support), done using osm2pgsql
  • add and maintain multiple database views, for multilingual rendering
  • procedure to update rendering database at regular interval, with new OSM changesets (with osmosis --read-change-interval)
  • procedure for regular complete re-imports to solve inconsistencies introduced by the diff-import

XAPI-Instance[edit]

Ortelius: production OSM tile server[edit]

  • Partition the server
  • The default.style would be functional, however it would be best to come up with a modified wikipedia style.
    • our style also needs to incorporate the multiple database views, which support rendering tiles for each language.
    • other styles need a different *.style file to allow rendering other features
  • to do

Cassini (toolserver)[edit]

See also: https://wiki.toolserver.org/view/OpenStreetMap_server/Setup_notes
  • php, perl & python with apache2 and on cli
  • access to mysql & postgresql
    see jira for a list of packages needed for this
  • a way for tools, that uses the osm-databases, to tell the users of the tools about the date/time of the last update and the date/time of the next planned update (similar to the globalsitenotice currently discussed on toolserver-l
  • samples on how to use cassini / the dbs in various languages on the wiki
  • list of project-ideas on wiki
  • will Cassini have it's own PostGIS / OSM database or shared from Ptolemy?

Background info[edit]

Server info: OpenStreetMap#Servers


  • Required bits for our purposes:
    • database
    • mapnik rendering
    • slippymap
    • api (maybe? for toolserver usage)