# OpenStreetMap Mapping Priority

Working in French Guiana, i came to know the Mapazonia project which aims to improve OpenStreetMap data of the Amazon region. Starting to contribute, i learnt to use the OSM Tasking Manager, a tool that divides Areas Of Interest (AOI) into task tiles. Interested in the Guiana Shield, i wondered if i could use a similar way to map this area.

Once task tiles created, thousands of them for such a huge AOI, a question arises: Where to map first?

To answer this, we could think of several approaches:

• we should find unmapped geographic features where the map is blank;
• there should be less justified empty tiles in remote areas, where there is few people living;
• where source data, satellite imagery, is of coarse resolution, geographic features covering large areas can be digitized (i.e. small scale mapping);
• at heart of AOI first ;

Let’s build a task tile priority map from these ideas!

## Existing data

We’ll begin by retrieving existing data to get a map similar to Martin Raifer’s work on node density. OSM extracts are available from Geofabrik and we can set up a PostGIS database following Regina Obe’s tutorial. Task tiles are level 12 tiles of the web mapping reference system. Here is how we create them:

```CREATE TABLE tile AS SELECT
x, y
FROM
generate_series(0, pow(2, 12)::INT - 1) AS x,
generate_series(0, pow(2, 12)::INT - 1) AS y ;
COMMENT ON TABLE tile IS 'Map tiles' ;
COMMENT ON COLUMN tile.x IS 'Tile coordinate' ;
COMMENT ON COLUMN tile.y IS 'Tile coordinate' ;
```

and optionally build their geometries (we are indeed at the conceptual frontier between vector and raster data models):

```CREATE FUNCTION ST_GeomFromTileXY(z int, x int, y int, OUT tile_geom geometry(Polygon, 4326)) AS \$\$
DECLARE
side real;
xmin real; ymin real; xmax real; ymax real;
minlon real; maxlon real; maxlat real; minlat real;
BEGIN
side := pow(2, z);

xmin := x::real/side - .5;
ymin := .5 - y::real/side;
xmax := (x + 1)::real/side - .5;
ymax := .5 - (y + 1)::real/side;

minlon := 360.*xmin;
maxlon := 360.*xmax;
maxlat := 90. - 360.*atan(exp(-ymin*2.*pi()))/pi();
minlat := 90. - 360.*atan(exp(-ymax*2.*pi()))/pi();

tile_geom := ST_MakeEnvelope(minlon, minlat, maxlon, maxlat, 4326);
END; \$\$ LANGUAGE plpgsql;```

Reference system to tile coordinates conversion can be achieved as follows:

```CREATE FUNCTION TileCoord(z int, point geometry(Point, 4326),
OUT tile_x int, OUT tile_y int) AS \$\$
DECLARE
tile_n real;
sinphi real; x real; y real;
BEGIN
tile_n := power(2, z)::real;

x := (ST_X(point) + 180.)/360.;
sinphi = sin(ST_Y(point)*pi()/180.);
y := .5 - ln((1. + sinphi)/(1. - sinphi))/(4.*pi());

tile_x := CAST(trunc(tile_n*x + .5) AS int);
tile_y := CAST(trunc(tile_n*y + .5) AS int);
END; \$\$ LANGUAGE plpgsql;```

To compute node density, we first find the tile each point falls in:

```CREATE TEMP TABLE planet_osm_point_dump AS
SELECT id, type, (coord).tile_x, (coord).tile_y, geom
FROM
(SELECT
row_number() OVER (ORDER BY type, id, (dump).path) AS id,
type,
TileCoord(12, (dump).geom) AS coord,
(dump).geom AS geom
FROM
(SELECT 0 AS type, osm_id AS id, ST_DumpPoints(way) AS dump
FROM planet_osm_point
UNION ALL
SELECT 1 AS type, osm_id AS id, ST_DumpPoints(way) AS dump
FROM planet_osm_line
UNION ALL
SELECT 2 AS type, osm_id AS id, ST_DumpPoints(way) AS dump
FROM planet_osm_polygon
) AS point
) AS point_tiled;
```

and count points by tile:

`ALTER TABLE tile ADD COLUMN node_count INT DEFAULT 0;`
```UPDATE tile t SET node_count = count.num
FROM (
SELECT x, y, count(*) AS num
FROM planet_osm_point_dump
GROUP BY x, y ORDER BY x, y) AS count
WHERE
t.x = count.x AND t.y = count.y;```

## Remote areas

The population seems to be concentrated on coasts and along main rivers, decreasing upstream. We’ll model remoteness with hydrological distance, the one traveled by surface runoff. It is computed with GRASS GIS `r.stream.distance` module. Hydrological modeling typically uses a flow direction map derived from a Elevation Model (DEM):

The USGS provides the GTOPO30 HYDRO1k dataset we can download from the  Earth Explorer portal (ID `GT30H1KSA` for South America). Its projection is the Lambert Azimuthal Equal Area that is described by the PROJ.4 parameters:

`+proj=laea +ellps=sphere +lon_0=-60 +lat_0=-15 +x_0=0.0 +y_0=0.0 +units=m`

From these parameters, create a location with `g.proj` and a mapset:

```g.proj -c proj4="+proj=laea +ellps=sphere +lon_0=-60 +lat_0=-15 +x_0=0.0 +y_0=0.0 +units=m" location=south_america
g.mapset -c mapset=hydro location=south_america```

The dataset contains several layers (see the README file) but we’re mainly interested in importing Elevation and Flow Directions:

```r.in.gdal    in=gt30h1ksa/sa_dem.bil out=elev
r.in.gdal -e in=gt30h1ksa/sa_fd.bil  out=dir_arc```

Transform direction codes to GRASS GIS format using a lookup table file `arc2grass.lut`:

```255 = 0
128 = 1
64  = 2
32  = 3
16  = 4
8   = 5
4   = 6
2   = 7
1   = 8```
`r.reclass in=dir_arc out=dir rules=arc2grass.lut`
`r.stream.distance -o dir=dir elev=elev method=up dist=dir_dist`

To assign distance to tiles, we have to reproject the grid from LAEA to Web Mercator. It is at zoom level 15 a tile side is of about 1223 m, the closest to our distance grid resolution.
We first create a location with this projection and move to this location:

```g.proj -c epsg=3857 location=mercator
g.mapset -c mapset=priority location=mercator```

Then create the level 15 region and reproject:

```g.region w=-20037508.342789244 s=-20037508.342789244 e=20037508.342789244 n=20037508.342789244 rows=32768 cols=32768
r.proj in=dir_dist loc=south_america mapset=hydro out=dir_dist_1k meth=near```

We now set the region to tile resolution and resample the grid:

```g.region w=-20037508.3427892 s=-20037508.3430388 e=20037508.3427892 n=20037508.3430388 rows=4096 cols=4096
r.resamp.stats in=dir_dist_1k out=dir_dist meth=median```

### Export

We finally lighten the data, as we dont need submetric precision, and either export it to a raster file:

```r.mapcalc "dir_dist = if(isnull(dir_dist), -1, int(dir_dist))"
r.out.gdal -t  in=dir_dist out=hydro_dist.tif type=UInt32 nodata=4294967295```

or to text, prior to a PostGIS import:

```r.mapcalc "y = row() - 1"
r.mapcalc "x = col() - 1"
r.out.xyz in=y        sep=space out=hydro_dist-y.xyz
r.out.xyz in=x        sep=space out=hydro_dist-x.xyz
r.out.xyz in=dir_dist sep=space out=hydro_dist.xyz

paste hydro_dist-x.xyz hydro_dist-y.xyz hydro_dist.xyz | awk '!/-1\$/ {print \$3, \$6, \$9}' > hydro_dist.ssv```

Note that NULL or masked cells won’t be exported (c.f. `r.out.xyz` documentation) making `paste` produce unexpected results.

```CREATE TEMP TABLE tile_attr (
z INTEGER DEFAULT 12, x INTEGER, y INTEGER,
value REAL);```
```COPY tile_attr (x, y, value) FROM 'hydro_dist.ssv' CSV DELIMITER ' ' ;
UPDATE tile t SET hydro_dist = a.value
FROM tile_attr a
WHERE t.level = a.z AND t.x = a.x AND t.y = a.y ;```

## Distance to AOI

We here build a weight grid to increase priority for tiles located at the heart of our area of interest. The weight is the as-the-crow-flies distance to its boundary.

In the Web Mercator location with the level 12 zoom region, import (using  `v.proj` if necessary) and rasterize your AOI:

`v.to.rast in=aoi type=line,area out=aoi use=val value=1`

Compute the distance to the AOI for outer cells:

`r.grow.distance in=aoi     dist=dist_outer_aoi`

Proceed the same way for inner cells:

```r.mapcalc "aoi_inv = if(isnull(aoi), 1, null())"
r.grow.distance in=aoi_inv dist=dist_inner_aoi```

Combine distance grids in inverting values for inner tiles:

`r.mapcalc "dist_aoi = if(isnull(aoi), dist_outer_aoi, -dist_inner_aoi)"`

With several AOIs, build a mask, shift and scale values to [0, 1]:

```r.mapcalc "roi = aoi_1 ||| aoi_2"
r.grow in=roi out=roi old=1 new=1
eval \$(r.univar -g dist_aoi | grep "\(min\|max\)")
r.mapcalc "dist_aoi = (dist_aoi - \$min)/(\$max - \$min)" --o```

and compute a weighted average of each distance.

`r.mapcalc "aois_dist = (dist_aoi_1 + dist_aoi_2)/2."`

## Imagery resolution

Imagery resolution can be retrieved the same way the Bing coverage analyser tool does (see the Coverage article on OSM Wiki). Here is an example Python code:

```PATTERN = "http://t{srv}.domain.tld/tile/{key}.jpg"
HEADER = "Content-Length"; server = randint(0, 7)
meta = urlopen(url).info()
```

and the PostgreSQL function computing the quadkey from zoom level and tile coordinates:

```CREATE FUNCTION QuadKey(z int, x int, y int, OUT quadkey character varying(16)) AS \$\$
DECLARE digit int; mask int; BEGIN

FOR i IN REVERSE z..1 LOOP
digit := 0;
mask  := 1 << (i - 1);

IF (x & mask) != 0 THEN digit := digit + 1; END IF;
IF (y & mask) != 0 THEN digit := digit + 2; END IF;

END LOOP;
END; \$\$ LANGUAGE plpgsql;```

In coarse resolution areas, Bing tiles are basically null. Let’s assess imagery quality by counting non null level 14 tiles:

```SELECT
FROM tile_bing WHERE z = 14 AND size > 0
GROUP BY substring(quadkey for 12) ORDER BY key ;```

## Cooking a priority index

With all these measurements by tile, we can now build a priority index according to our wishes. For example:

`priority = ((1 - density) + dist_hydro + (1 - dist_aoi) + (1 - resolution))/4`

sets a high value for tiles containing no nodes, located far from the coast but close to our area of interest and covered by low resolution imagery.

To use the priority map within QGIS, add a `status` column to the `tile` table, and create a dynamic TODO Tile layer with the DB Manager plugin and the following query:

```SELECT
row_number() OVER (ORDER BY priority DESC) AS id,
priority, x, y, ST_ExteriorRing(geom) AS geom
FROM tile WHERE status <> 'DONE'
ORDER BY priority DESC
LIMIT 32 ;```

This new layer shows the 32 next task tiles to map:

Before mapping with JOSM, either save a selected tile to GPX (check `FORCE_GPX_TRACK` and note the use of `ST_ExteriorRing`), or call GDAL from a layer action:

`ogr2ogr -f GPX -sql "SELECT x || ' ' || y AS name, ST_ExteriorRing(geom) FROM tile WHERE x = '[% "x" %]' AND y = '[% "y" %]'" [% "x" %]_[% "y" %].gpx PG:dbname=osm -lco FORCE_GPX_TRACK=YES -nlt LINESTRING`

Once the tile mapped, update its `status` attribute and refresh QGIS’ display (`F5`).

Using this priority map, i noticed sometimes Mapbox imagery is of higher resolution than Bing’s. It would be useful to find if they provide an imagery resolution map. In addition, two tiles of consequential rank can be spatially very distant, finding a way to use them by clusters could be interesting. Finally, shouldn’t priority be directly included within OSMTM?

Happy mapping!

## 3 réflexions au sujet de « OpenStreetMap Mapping Priority »

1. Matthieu Noucher dit :

Bonjour,

Chercheur au CNRS je travaille sur les nouveaux modes de fabriques cartographique et m’intéresse en particulier à la Guyane où je serai en mission à partir du 15 mars. J’aurai souhaité y rencontrer des contributeurs d’OSM et des participants au projet Mapazonia. Etes-vous toujours sur place et pourrions nous nous rencontrer ? Par ailleurs, auriez-vous d’autres contacts à me suggérer ?

Merci d’avance.

MN

2. sikis izle dit :

Bonjour, ton blog est très réussi ! Je te dis bravo ! C’est du beau boulot !:)

Ce site utilise Akismet pour réduire les indésirables. En savoir plus sur comment les données de vos commentaires sont utilisées.