Map generator

From Meta, a Wikimedia project coordination wiki

The maps generator works in roughly 2 phases:

  1. generation of the map - BlueMarble, polygons, rivers, borders, etc.
  2. placement of labels

Placement of city dots falls somewhere between.

Phase 1 is algorithmically quite easy, but we have no luck with the libs.

Phase 2 is algorithmically difficult, and currently used code doesn't work well. But the situation with libs is a bit better.

Details[edit]

Let's say we want to generate maps like that one. We must specify:

  • how big we want it to be
  • what projection to use
  • what region should map show
  • what to show on the map

Technically we could specify either:

  • both dimensions of result picture, and projection without standard latitude
  • only one dimension, and add standard latitude to projection information

The former is much easier to understand for average person and it's easier to implement (in fact it's the mode my scripts currently operate in). The latter is harder to make mistakes with and may be more "geographically correct".

Then, there are layers of data to be displayed, the easiest way is just to assume that whatever is given first goes to the bottom.

<map>
600x600
equalarea 0E-18E 54N-36N
landborders
capitals
cities_over_1M
</map>

Assuming landborders, capitals, cities_over_1M is understood by the system, and we want "default" style to be used, this seems like a nice way of specifying a map to be rendered.

Now what should be implemented is not just a static image, but an image with lot of links. So when I click on Berlin, it moves me to article about Berlin, when I click on Italy, to article about Italy etc. One of the problems is that there are usually many articles that could be reasonably linked from each place. Should clicking near "Palermo" move to "Palermo", "Sicily", "Italy", "Europe" or what ? One possibility is to make some sane defaults for non-Javascript browsers and use JavaScript to ask where does user want to go.

Automatic zoom in/out, move up/down/left/right buttons should be quite easy too.
Won't that load server too much ?
Maybe, if the program is really heavy. But I think that, after a few people have played with the buttons, the dozen or so maps generated this way would be all in the server cache.

Technically generating maps "my way" doesn't take lot of CPU, but uses enormous amount of memory. Not much of it is used for generating actual map, most is wasted because of horrible locality in format we use - kernel reads whole 4kB to memory everywhere where we just want a couple of bytes. Changing data structures could greatly improve our results. One easy improvement would be to have both high and low resolution data available, and to make generator automatically choose the right one. With the algorithm the current generator uses, it would in fact significantly improve quality of images of big parts of globe.

I expect maps to be generated not too often: a cache mechanism similar to the one used for images and TeX math will be surely needed, so after a map has been generated no overhead will be present. Even if the map generation program grabs 500MB to make one map, it does not seem to me a major problem. At18 11:37, 22 Aug 2003 (UTC)

Algorithms[edit]

Placing labels[edit]

Cities, countries and other features have to be labeled. Placing these labels without intersections is a very difficult problem for large numbers of cities. Some papers covering this topic:

Basic procedure:

  • Place the labels somewhere
  • Repeat for some time:
    • Move intersecting labels around
    • If solution is better or not that much worse, keep it

Possible modifications

  • Split problem into smaller independent problems to speed things up
  • If problem size is small enough, use brute force. Brute force searching for a placement of labels has possible solutions when placing the label either centered or at the corners.

See also: en:Automatic label placement

C script[edit]

This script cuts a fragment of BlueMarble and projects it, scaling at the same time. So far it supports:

  • raw data, aka. unscaled equirectangular projection
  • scaled equidistant cylindrical (closest-neighbour and bilinear scaling)
  • equal-area cylindrical (this one is more or less useful)
  • Mercator

It uses mmap()-per-line interface and is very fast. Unfortunately due to limits of 32 bit processors it was impossible to mmap() everything (data for 2 hemispheres is 2.7GB, Linux on IA32 allows to mmap() about 2GB, architecture limits are 4GB).

Code for projections[edit]

This is C code (unfortunately).

Algorithm used is not that bad for zooming in, but doesn't look very good for zooming out.

Drawing on that map is handled by Perl using PerlMagick.

#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <errno.h>
#include <math.h>
#define cfatal(cond,msg) do { if (cond) {fprintf (stderr,(msg)); exit(1);} } while (0)

const int   hemisphere_file_size = 1399680000;
const char *hemisphere_file_name_east = "data/east";
const char *hemisphere_file_name_west = "data/west";

typedef unsigned char byte;

struct line_buffer {
    byte  * buf;
    int     system_size;
    int     line_nr;
};

struct hemisphere {
    int fd;
    byte **line_ptrs;
    struct line_buffer buffer[16];
    int    cur;
} west, east;

/* AUX */

void init_hemisphere(struct hemisphere *h, const char *fn)
{
    int i;

    h->fd = open (fn, O_RDONLY);
    cfatal(h->fd == -1, "Error opening file\n");
    h->line_ptrs = malloc (21600 * sizeof(byte*));
    for (i=0; i<21600; i++)
        h->line_ptrs[i] = 0;
    for (i=0; i<16; i++)
    {
        h->buffer[i].buf = 0;
        h->buffer[i].system_size = 0;
        h->buffer[i].line_nr = -1;
    }
    h->cur = 0;
}

void init()
{
    init_hemisphere(&east, hemisphere_file_name_east);
    init_hemisphere(&west, hemisphere_file_name_west);
}

void load_line(struct hemisphere *h, int nr)
{
    byte *buf;
    int ofs, sz;
    int i;

    if (h->buffer[h->cur].buf)
    {
        munmap (h->buffer[h->cur].buf, h->buffer[h->cur].system_size);
        h->line_ptrs[h->buffer[h->cur].line_nr] = (byte*)0;
    }

    ofs = 3 * 21600 * nr;
    sz = 3 * 21600;
    i = ofs % 4096;
    ofs -= i;
    sz  += i;

    buf = mmap(0, sz, PROT_READ, MAP_PRIVATE, h->fd, ofs);
    cfatal (!buf, "Can't mmap() a line");

    h->buffer[h->cur].buf = buf;
    h->buffer[h->cur].system_size = sz;

    h->buffer[h->cur].line_nr = nr;
    h->line_ptrs[nr] = &h->buffer[h->cur].buf[i];

    h->cur ++;
    h->cur %= 16;
}

void print_ppm_headers(FILE *file, int size_x, int size_y)
{
    fprintf(file, "P6\n%d %d\n255\n", size_x, size_y);
    fflush(file);
}

void force_write(int fd, char *buf, int size)
{
    int i = 0;
    int res;

    while (i < size)
    {
        res = write (fd, buf + i, size - i);
        if (res == -1)
            cfatal (errno != EINTR, "Error outputting data\n");
        else
            i += res;
    }
}

inline byte *get_value(int x, int y)
{
    static byte zero [] = {0, 0, 0};
    if (y < 0 || y >= 21600)
        return zero;
/* C is truly braindead. The only correct answer for -2 % 3 is 1
 * but C somehow thinks it's -2
 */
    x = (x + 21600 * 2) % (21600 * 2);
    if (x >= 21600) {
        if (!west.line_ptrs[y])
            load_line(&west, y);
        return &west.line_ptrs[y][3 * (x - 21600)];
    } else {
        if (!east.line_ptrs[y])
            load_line(&east, y);
        return &east.line_ptrs[y][3 * x];
    }
}

inline void bilinear_put(byte * buf, int x, int tx, int ty, float px, float py)
{
    byte * ul = get_value(tx, ty);
    byte * ur = get_value(tx, ty + 1);
    byte * ll = get_value(tx + 1, ty);
    byte * lr = get_value(tx + 1, ty + 1);
    int i;

    for (i=0; i<3; i++)
    buf[x*3 + i] =  (byte)
                    ((py * lr[i] + (1-py) * ur[i]) * px +
                     (py * ll[i] + (1-py) * ul[i]) * (1-px));
}

inline void bilinear_put2(byte * buf, int x, float inx, float iny)
{
    int tx, ty;
    float px, py;

    px = inx - floor(inx);
    tx = (int)inx;
    py = iny - floor(iny);
    ty = (int)iny;

    bilinear_put(buf, x, tx, ty, px, py);
}

#define geo2pixel_x(geox) ((geox) * 21599 / M_PI)
#define geo2pixel_y(geoy) ((M_PI_2 - (geoy)) * 21599 / M_PI)
#define d2r(d) ((d) * (M_PI / 180.0))

/* PROJECTIONS */

void cyl_equidistant(int sizex, int sizey, float north, float west, float south, float east)
{
    byte buf[sizex * 3];
    int x, y;
    float tx, ty;

    print_ppm_headers(stdout, sizex, sizey);

    for (y=0; y<sizey; y++)
    {
        for (x=0; x<sizex; x++)
        {
            tx = geo2pixel_x(west  + (east  - west ) * x / (sizex - 1));
            ty = geo2pixel_y(north + (south - north) * y / (sizey - 1));
            bilinear_put2(buf, x, tx, ty);
        }
        force_write(1, buf, sizex * 3);
    }
}


void cyl_equalarea(int sizex, int sizey, float north, float west, float south, float east)
{
    byte buf[sizex * 3];
    int x, y;
    float tx, ty;

    print_ppm_headers(stdout, sizex, sizey);

    for (y=0; y<sizey; y++)
    {
        for (x=0; x<sizex; x++)
        {
            tx = geo2pixel_x(west  + (east  - west) * x / (sizex - 1));
            ty = geo2pixel_y(asin(sin(north) + (sin(south) - sin(north)) * y / (sizey - 1)));
            bilinear_put2(buf, x, tx, ty);
        }
        force_write(1, buf, sizex * 3);
    }
}

void cyl_mercator(int sizex, int sizey, float north, float west, float south, float east)
{
    byte buf[sizex * 3];
    int x, y;
    float tx, ty;

    print_ppm_headers(stdout, sizex, sizey);
#define forw(x) log(tan(x) + 1/cos(x))
#define back(x) (2 * atan (exp(x)) - M_PI_2)

    for (y=0; y<sizey; y++)
    {
        for (x=0; x<sizex; x++)
        {
            tx = geo2pixel_x(west  + (east  - west) * x / (sizex - 1));
            ty = geo2pixel_y(back(forw(north) + (forw(south) - forw(north)) * y / (sizey - 1)));
            bilinear_put2(buf, x, tx, ty);
        }
        force_write(1, buf, sizex * 3);
    }
}

int main(int argc, char **argv)
{
    int xsize;
    int ysize;
    float north;
    float south;
    float west;
    float east;

    cfatal (argc != 8, "Usage: project <projection> <xsize> <ysize> <north> <south> <west> <east>\n");

    xsize = atoi(argv[2]);
    ysize = atoi(argv[3]);
    north = d2r(atof(argv[4]));
    south = d2r(atof(argv[5]));
    west  = d2r(atof(argv[6]));
    east  = d2r(atof(argv[7]));

    init ();

    if (strcmp(argv[1],"a") == 0)
        cyl_equalarea(xsize,ysize,north,west,south,east);
    else if (strcmp(argv[1],"m") == 0)
        cyl_mercator(xsize,ysize,north,west,south,east);
    else if (strcmp(argv[1],"d") == 0)
        cyl_equidistant(xsize,ysize,north,west,south,east);
    else
        cfatal(1, "Unknown projection type\n");
    return 0;
}

Code for positioning labels[edit]

Here's some code for positioning non-overlapping labels, using genetic algorithms: the code is in Python, but the idea can be adapted easily to other languages: the O(n^2) parts of the code can easily be made to run faster by x-y sorting the rectangles before comparison. Note that the key to its rapid convergence is the directed mutation, which occurs after crossover, and only where labels still overlap.

This appears to be able to deal with quite heavy-duty problems, for example, with the parameters below, the labels cover 45% of the entire chart area, yet 100% placement can be achieved in about 30 seconds of CPU time on a modern 2 GHz PC. With the O(n^2) code optimised to use a more complex but more efficient algorithm, this would probably complete in a few seconds.

# Placing labels so that they do not interfere

import math, random

label_width = 40
label_height = 15
n_labels = 20
image_width = 200
image_height = 200
n_startgenes = 2000     # size of starting gene pool 
n_bestgenes = 50        # genes selected for cross-breeding
eps = 1.0e-10

# Return a displaced rectangle in l,b,r,t format
def rect_displace(rect, xdisp, ydisp):
    l, b, r, t = rect
    return (l+xdisp, b+ydisp, r+xdisp, t+ydisp)

# There are eight possible alignment codes, corresponding to the 
# corners and side mid-points of the rectangle
# Codes are 0..7
# Code 6 is the most preferred
def gen_rect(rect, code):
    # Take a rectangle in l,b,r,t format
    l, b, r, t = rect
    width = r - l
    height = t - b
    xdisp = [-1, -1, -1,  0,  0,  1,  1,  1][code]*width/2.0
    ydisp = [-1,  0,  1, -1,  1, -1,  0,  1][code]*height/2.0       
    return rect_displace(rect, xdisp, ydisp)

# Finds intersection area of two rectangles
def rect_intersect(rect1, rect2):
    l1, b1, r1, t1 = rect1    
    l2, b2, r2, t2 = rect2
    w = min(r1, r2) - max(l1, l2)   
    h = min(t1, t2) - max(b1, b2)
    if w <= 0: return 0    
    if h <= 0: return 0
    return w*h

def nudge_rectlist(rectlist):
    # Nudge the labels slightly if they overlap:
    # this makes things hugely easier for the optimiser
    for i in range(len(rectlist)):
        for j in range(i):
            if rect_intersect(rectlist[i], rectlist[j]):
                l1, b1, r1, t1 = rectlist[i]
                l2, b2, r2, t2 = rectlist[j]
                xdisp = (l1 + r1)/2.0 - (l2 + r2)/2.0 
                ydisp = (b1 + t1)/2.0 - (b2 + t2)/2.0
                nudge = 5.0
                pyth = math.sqrt(xdisp**2 + ydisp**2)/nudge
                if pyth > eps:
                    rectlist[i] = rect_displace(rectlist[i],
                                                -xdisp/pyth, -ydisp/pyth)
                    rectlist[j] = rect_displace(rectlist[j],
                                                xdisp/pyth, ydisp/pyth)
    return rectlist

# Objective function: O(n^2) time
def objective(rects, gene):
    rectlist = [gen_rect(rect, code) for rect, code in zip(rects, gene)]
    # Allow for "bending" the labels a bit
    rectlist = nudge_rectlist(rectlist)
    area = 0
    for i in range(len(rectlist)):
	for j in range(i):
	    area += rect_intersect(rectlist[i], rectlist[j])
    image_rect = [0, 0, image_width, image_height]
    # Penalize labels which go outside the image area
    for i in range(len(rectlist)):
        l, b, r, t = rectlist[i]
        if abs(rect_intersect(rectlist[i], image_rect) - (r - l)*(t - b)) < eps:
            area += image_width * image_height
    return area

# Mutation function: O(n^2) time
def mutate(rects, gene, prob):
    newgene = gene
    rectlist = [gen_rect(rect, code) for rect, code in zip(rects, gene)]
    # Directed mutation where two rectangles intersect
    for i in range(len(rectlist)):
	for j in range(i):
            if rect_intersect(rectlist[i], rectlist[j]):
                newgene[i] = random.randrange(8)
    # And a bit of random mutation, too
    for i in range(len(gene)):
        if random.random() <= prob:
            newgene[i] = random.randrange(8)
    return newgene

# Crossbreed a base pair
def xbreed(p1, p2):
    # Selection
    if random.randint(0,1):
        return p1
    else:
        return p2

# Crossbreed two genes, then mutate at "hot spots" where intersections remain
def crossbreed(g1, g2):
    return [xbreed(x, y) for x, y in zip(g1, g2)]

random.seed()

# Make a list of label rectangles in their reference positions,
# centered over the map feature; the real labels are displaced
# from these positions so as not to overlap
# Note that some labels are bigger than others
rects = []
for i in range(n_labels):
    x = random.uniform(image_width*0.01, image_width*0.99)
    y = random.uniform(image_height*0.01, image_height*0.99)
    size = random.choice([0.7, 1, 1, 1.5])
    width = label_width * size
    height = label_height * size
    rects.append((x-width/2.0, y-height/2.0, x+width/2.0, y+height/2.0))

print "cover =", sum([rect_intersect(r,r) for r in rects])/(image_width*image_height)

# Make some starting genes
genes = [
    [random.randrange(8) for i in range(n_labels)]
    for j in range(n_startgenes)]

prob = 0.2

for i in range(10):
    rankings = [(objective(rects, gene), gene) for gene in genes]
    rankings.sort()
    genes = [r[1] for r in rankings]
    bestgenes = genes[:n_bestgenes]
    bestscore = rankings[0][0]
    if bestscore == 0:
        print "Done"
        break

    # At each stage, we breed the best genes with one another
    genes = bestgenes + [mutate(rects, crossbreed(g1, g2), prob) for g1 in bestgenes for g2 in bestgenes]
    print bestscore, prob

Dynamap Mediawiki Extension[edit]

Dynamap example, showing the Suez Canal

I'm currently developing a Mediawiki-Extension which implements a simple markup language for maps. The key feature is easy editability. Coordinates will not show up in the map definition, except to select the shown area. All plotted features will either come from external datafiles (rivers, coast lines, borders), or will simply be references to geocoded wikipedia articles. Check the example on the right, it shows a map of the Sinai peninsula in egypt generated using the following markup:

<dynamap>
area: 31.68457,31,27.531738,35.5
image: 450

start

level: 1
stroke: 1
color: black
layer: africa-cil

# Suez Canal, also clickable
color: #F07568
stroke: 6
poly: Suez Canal

# major cities in the area
radius: 8
color: #B00000
point: Cairo
radius: 4
point: Sharm el-Sheikh
point: Port Said
point: Suez
</dynamap>

The extension parses the wikipedia articles referenced using the poly: and point: commands and extrats coordinate information. The point: commans looks for coor templates. For the poly: command I propose a new template called geopoly. Output is an SVG image with clickable links to the referenced articles as well as a png version for browsers without SVG support.

This extension should facilitate the quick generation of simple maps, which will be easily translatable to different languages.

The new version of the map features filled polygons with a colour scheme from en:Wikipedia_talk:WikiProject_Maps, a new super fast clipping algorithm to minimize svg output size, and a new dataset (GSHHS). --Dschwen 22:20, 11 August 2005 (UTC)[reply]

Using WMS-Services - QuickWMS-Extension[edit]

The QuickkWMS-Extension for MediaWiki is using QuickWMS. It uses WMS-Servers for displying maps.

Beispiel / Sample

see http://www.giswiki.org/index.php/QuickWMS-Extension

Screenshot

Using a georeferenzed image for point display - Point-Mapping Extension[edit]

Description[edit]

The Point-Mapping Extension (MapPoint) combines a georeferenced image (map) and point-coordinates. The result is a map with a points where they should be.

Features[edit]

  • using georeferenz for a map (an image)
  • representation of one point of situation by means of geographical coordinates
  • representation as many points you want
  • linkage of the point with any URL
  • reference display window when driving over the point with the mouse

Samples and Code[edit]

see http://www.giswiki.org/wiki/Point-Mapping_Extension

Screenshot[edit]


Potentially useful software[edit]

  • Xrmap an open source GIS using free data, covering the whole earth by country, and adding several additional informations. All free software/data.
  • MapServer - Uses Open Source components like Shapelib, FreeType, Proj.4, libTIFF, Perl

http://remotesensing.org/proj/

http://www.aquarius.geomar.de/omc_intro.html - they generate maps using GMT. They're very ugly but the data would be quite useful. They have rivers and basic political data. Ah, it seems it's just GMT data.

GMT can produce some rather pretty maps, though. See [1], especially the cookbook section. Also, the maps can be antialiased if you generate postscript and feed it through ghostscript, eg en:Image:Skegness - Lincolnshire dot.png Lupin 03:07, 16 Apr 2005 (UTC)

  • Coordinate Converter extension for MediaWiki (used by WikiWikiWalks). Reads Ordnance Survey coordinates and translates them to longitude/latidude. The Ordnance Survey is the official UK mapping body. The extension is functional, but ugly - it's still in the early stages of development.

Geo::ShapeFile and Geo::E00 on CPAN:

  • is there no online tool to create blank maps, just with water bodies and topography? aquarius.geomar.de would be perfect, but their maps are just too small and ugly. There used to be the Xerox mapserver, but that's been offline for ages, and nothing has really replaced it.


Useful Links[edit]