OpenStreetMap Mapping Priority

OpenStreetMap Mapping Priority Map
OpenStreetMap Mapping Priority

Working in French Guiana, i came to know the Mapazonia project which aims to improve OpenStreetMap data of the Amazon region. Starting to contribute, i learnt to use the OSM Tasking Manager, a tool that divides Areas Of Interest (AOI) into task tiles. Interested in the Guiana Shield, i wondered if i could use a similar way to map this area.

Once task tiles created, thousands of them for such a huge AOI, a question arises: Where to map first?

To answer this, we could think of several approaches:

  • we should find unmapped geographic features where the map is blank;
  • there should be less justified empty tiles in remote areas, where there is few people living;
  • where source data, satellite imagery, is of coarse resolution, geographic features covering large areas can be digitized (i.e. small scale mapping);
  • at heart of AOI first ;

Let’s build a task tile priority map from these ideas!

Existing data

We’ll begin by retrieving existing data to get a map similar to Martin Raifer’s work on node density. OSM extracts are available from Geofabrik and we can set up a PostGIS database following Regina Obe’s tutorial. Task tiles are level 12 tiles of the web mapping reference system. Here is how we create them:

  x, y
  generate_series(0, pow(2, 12)::INT - 1) AS x,
  generate_series(0, pow(2, 12)::INT - 1) AS y ;
COMMENT ON TABLE tile IS 'Map tiles' ;
COMMENT ON COLUMN tile.x IS 'Tile coordinate' ;
COMMENT ON COLUMN tile.y IS 'Tile coordinate' ;

and optionally build their geometries (we are indeed at the conceptual frontier between vector and raster data models):

CREATE FUNCTION ST_GeomFromTileXY(z int, x int, y int, OUT tile_geom geometry(Polygon, 4326)) AS $$
    side real;
    xmin real; ymin real; xmax real; ymax real;
    minlon real; maxlon real; maxlat real; minlat real;
    side := pow(2, z);

    xmin := x::real/side - .5;
    ymin := .5 - y::real/side;
    xmax := (x + 1)::real/side - .5;
    ymax := .5 - (y + 1)::real/side;

    minlon := 360.*xmin;
    maxlon := 360.*xmax;
    maxlat := 90. - 360.*atan(exp(-ymin*2.*pi()))/pi();
    minlat := 90. - 360.*atan(exp(-ymax*2.*pi()))/pi();

    tile_geom := ST_MakeEnvelope(minlon, minlat, maxlon, maxlat, 4326);
END; $$ LANGUAGE plpgsql;

Reference system to tile coordinates conversion can be achieved as follows:

CREATE FUNCTION TileCoord(z int, point geometry(Point, 4326),
  OUT tile_x int, OUT tile_y int) AS $$
    tile_n real;
    sinphi real; x real; y real;
    tile_n := power(2, z)::real;
    x := (ST_X(point) + 180.)/360.;
    sinphi = sin(ST_Y(point)*pi()/180.);
    y := .5 - ln((1. + sinphi)/(1. - sinphi))/(4.*pi());

    tile_x := CAST(trunc(tile_n*x + .5) AS int);
    tile_y := CAST(trunc(tile_n*y + .5) AS int);
END; $$ LANGUAGE plpgsql;

To compute node density, we first find the tile each point falls in:

CREATE TEMP TABLE planet_osm_point_dump AS
SELECT id, type, (coord).tile_x, (coord).tile_y, geom
      row_number() OVER (ORDER BY type, id, (dump).path) AS id,
      TileCoord(12, (dump).geom) AS coord,
      (dump).geom AS geom
        (SELECT 0 AS type, osm_id AS id, ST_DumpPoints(way) AS dump
        FROM planet_osm_point
        SELECT 1 AS type, osm_id AS id, ST_DumpPoints(way) AS dump
        FROM planet_osm_line
        SELECT 2 AS type, osm_id AS id, ST_DumpPoints(way) AS dump
        FROM planet_osm_polygon
        ) AS point
    ) AS point_tiled;

and count points by tile:

UPDATE tile t SET node_count = count.num
  SELECT x, y, count(*) AS num
  FROM planet_osm_point_dump
  GROUP BY x, y ORDER BY x, y) AS count
  t.x = count.x AND t.y = count.y;

Remote areas

The population seems to be concentrated on coasts and along main rivers, decreasing upstream. We’ll model remoteness with hydrological distance, the one traveled by surface runoff. It is computed with GRASS GIS module. Hydrological modeling typically uses a flow direction map derived from a Elevation Model (DEM):

Hydrological flow direction
Hydrological flow direction

The USGS provides the GTOPO30 HYDRO1k dataset we can download from the  Earth Explorer portal (ID GT30H1KSA for South America). Its projection is the Lambert Azimuthal Equal Area that is described by the PROJ.4 parameters:

+proj=laea +ellps=sphere +lon_0=-60 +lat_0=-15 +x_0=0.0 +y_0=0.0 +units=m

From these parameters, create a location with g.proj and a mapset:

g.proj -c proj4="+proj=laea +ellps=sphere +lon_0=-60 +lat_0=-15 +x_0=0.0 +y_0=0.0 +units=m" location=south_america
g.mapset -c mapset=hydro location=south_america

The dataset contains several layers (see the README file) but we’re mainly interested in importing Elevation and Flow Directions:    in=gt30h1ksa/sa_dem.bil out=elev -e in=gt30h1ksa/sa_fd.bil  out=dir_arc

Transform direction codes to GRASS GIS format using a lookup table file arc2grass.lut:

255 = 0
128 = 1
64  = 2
32  = 3
16  = 4
8   = 5
4   = 6
2   = 7
1   = 8
r.reclass in=dir_arc out=dir rules=arc2grass.lut -o dir=dir elev=elev method=up dist=dir_dist
Distance to outlet
Distance to outlet

To assign distance to tiles, we have to reproject the grid from LAEA to Web Mercator. It is at zoom level 15 a tile side is of about 1223 m, the closest to our distance grid resolution.
We first create a location with this projection and move to this location:

g.proj -c epsg=3857 location=mercator
g.mapset -c mapset=priority location=mercator

Then create the level 15 region and reproject:

g.region w=-20037508.342789244 s=-20037508.342789244 e=20037508.342789244 n=20037508.342789244 rows=32768 cols=32768
r.proj in=dir_dist loc=south_america mapset=hydro out=dir_dist_1k meth=near

We now set the region to tile resolution and resample the grid:

g.region w=-20037508.3427892 s=-20037508.3430388 e=20037508.3427892 n=20037508.3430388 rows=4096 cols=4096
r.resamp.stats in=dir_dist_1k out=dir_dist meth=median


We finally lighten the data, as we dont need submetric precision, and either export it to a raster file:

r.mapcalc "dir_dist = if(isnull(dir_dist), -1, int(dir_dist))"
r.out.gdal -t  in=dir_dist out=hydro_dist.tif type=UInt32 nodata=4294967295

or to text, prior to a PostGIS import:

r.mapcalc "y = row() - 1"
r.mapcalc "x = col() - 1" in=y        sep=space in=x        sep=space in=dir_dist sep=space

paste | awk '!/-1$/ {print $3, $6, $9}' > hydro_dist.ssv

Note that NULL or masked cells won’t be exported (c.f. documentation) making paste produce unexpected results.

value REAL);
COPY tile_attr (x, y, value) FROM 'hydro_dist.ssv' CSV DELIMITER ' ' ;
UPDATE tile t SET hydro_dist = a.value
FROM tile_attr a
WHERE t.level = a.z AND t.x = a.x AND t.y = a.y ;

Distance to AOI

We here build a weight grid to increase priority for tiles located at the heart of our area of interest. The weight is the as-the-crow-flies distance to its boundary.

In the Web Mercator location with the level 12 zoom region, import (using  v.proj if necessary) and rasterize your AOI: in=aoi type=line,area out=aoi use=val value=1

Compute the distance to the AOI for outer cells:

r.grow.distance in=aoi     dist=dist_outer_aoi

Proceed the same way for inner cells:

r.mapcalc "aoi_inv = if(isnull(aoi), 1, null())"
r.grow.distance in=aoi_inv dist=dist_inner_aoi

Combine distance grids in inverting values for inner tiles:

r.mapcalc "dist_aoi = if(isnull(aoi), dist_outer_aoi, -dist_inner_aoi)"

With several AOIs, build a mask, shift and scale values to [0, 1]:

r.mapcalc "roi = aoi_1 ||| aoi_2"
r.grow in=roi out=roi old=1 new=1
r.mask roi
eval $(r.univar -g dist_aoi | grep "\(min\|max\)")
r.mapcalc "dist_aoi = (dist_aoi - $min)/($max - $min)" --o

and compute a weighted average of each distance.

r.mapcalc "aois_dist = (dist_aoi_1 + dist_aoi_2)/2."
Weighted distance to heart of AOI
Weighted distance to heart of area of interest

Imagery resolution

Imagery resolution can be retrieved the same way the Bing coverage analyser tool does (see the Coverage article on OSM Wiki). Here is an example Python code:

PATTERN = "http://t{srv}.domain.tld/tile/{key}.jpg"
HEADER = "Content-Length"; server = randint(0, 7)
url = PATTERN.format(srv=server, key=quadkey)
meta = urlopen(url).info()
size = meta.getheaders(HEADER)[0]

and the PostgreSQL function computing the quadkey from zoom level and tile coordinates:

CREATE FUNCTION QuadKey(z int, x int, y int, OUT quadkey character varying(16)) AS $$
DECLARE digit int; mask int; BEGIN
    quadkey := '';

        digit := 0;
        mask  := 1 << (i - 1);
        IF (x & mask) != 0 THEN digit := digit + 1; END IF;
        IF (y & mask) != 0 THEN digit := digit + 2; END IF;
        quadkey := quadkey || digit;
END; $$ LANGUAGE plpgsql;

In coarse resolution areas, Bing tiles are basically null. Let’s assess imagery quality by counting non null level 14 tiles:

  substring(quadkey for 12) AS key,
  count(quadkey)/16 AS resolution_index
FROM tile_bing WHERE z = 14 AND size > 0
GROUP BY substring(quadkey for 12) ORDER BY key ;
Bing imagery resolution
Bing imagery resolution

Cooking a priority index

With all these measurements by tile, we can now build a priority index according to our wishes. For example:

priority = ((1 - density) + dist_hydro + (1 - dist_aoi) + (1 - resolution))/4

sets a high value for tiles containing no nodes, located far from the coast but close to our area of interest and covered by low resolution imagery.

To use the priority map within QGIS, add a status column to the tile table, and create a dynamic TODO Tile layer with the DB Manager plugin and the following query:

  row_number() OVER (ORDER BY priority DESC) AS id,
  priority, x, y, ST_ExteriorRing(geom) AS geom
FROM tile WHERE status <> 'DONE'
ORDER BY priority DESC
LIMIT 32 ;

This new layer shows the 32 next task tiles to map:

TODO Tiles
Task tiles of highest priority


Before mapping with JOSM, either save a selected tile to GPX (check FORCE_GPX_TRACK and note the use of ST_ExteriorRing), or call GDAL from a layer action:

ogr2ogr -f GPX -sql "SELECT x || ' ' || y AS name, ST_ExteriorRing(geom) FROM tile WHERE x = '[% "x" %]' AND y = '[% "y" %]'" [% "x" %]_[% "y" %].gpx PG:dbname=osm -lco FORCE_GPX_TRACK=YES -nlt LINESTRING

Once the tile mapped, update its status attribute and refresh QGIS’ display (F5).


Using this priority map, i noticed sometimes Mapbox imagery is of higher resolution than Bing’s. It would be useful to find if they provide an imagery resolution map. In addition, two tiles of consequential rank can be spatially very distant, finding a way to use them by clusters could be interesting. Finally, shouldn’t priority be directly included within OSMTM?

Happy mapping!

Spreadsheet to GIS

Sooner or later, getting the spatial data you asked for, you’ll be provided with a… .xls spreadsheet file, which might look like this:

longitude | latitude | name
-52.31018 | 4.19783  | Saut Mapaou
-53.05015 | 5.09345  | Crique Grégoire

Let’s call it poi.xls.

Assuming coordinates are in a homogeneous decimal format, and that the coordinate reference system is known, here is how to use this file with the GIS you like, thanks to GDAL:

Write a VRT file poi.vrt describing:

  • the desired layer name ;
  • the spreadsheet file name ;
  • the geometry type ;
  • the reference system EPSG code ; and
  • the column names the geometry will be built from.
  <OGRVRTLayer name="Points">
    <GeometryField encoding="PointFromColumns"
     x="longitude" y="latitude" />

You can now open poi.vrt with QGIS or convert it to any GDAL supported vector format. For example, reproject and import it to PostGIS:

 ogr2ogr -t_srs EPSG:2972 -f PostgreSQL PG:dbname=survey poi.vrt -lco GEOMETRY_NAME=geom -nln poi_xls

In case GDAL complains about not finding the spreadsheet file, provide the absolute path as the SrcDataSource.

This procedure should also work with .xlsx files, although not tested personally.

RGB to HLS raster conversion with R

Hello digital World!

As my first blog post, here is a note about how to convert a RGB raster to HLS color space with R. I remember having searched the web quite a long time before finding an efficient way.

We use the raster and colorspace packages:


First, read the RGB raster stack:

bands <- stack("image.tif")

and create output:

hls <- setValues(stack(bands[[1:3]]), NA)

Convert RGB to HLS:

values(hls) <- colorspace::coords(
    ), "HLS"

finally, rename output bands:

names(hls) <- c("H", "L", "S")