Skip to content

Conversation

@jake-low
Copy link
Member

@jake-low jake-low commented Sep 8, 2025

This PR adds a "Places" layer containing OSM features tagged place=*, which represent countries, states, provinces, cities, towns, villages, neighborhoods, etc. Features in this layer are all Point geometries: they represent the approximate center of the feature.

I added this layer partly because it's useful to me, but also because it's simple and therefore a good testbed for two new schema ideas:

  • The population column is an integer. This is the first foray into parsing OSM's string-valued tags into other data types.
  • The name column contains the value of that tag in OSM, which (usually) represents the name in the primary local language. But this layer also contains a names column which is a Map<str, str> which contains all of the OSM tags whose keys start with name:, e.g. name:left, name:fr, and even name:etymology:wikidata. Keys in the map are the tag key with the name: prefix dropped, and values are the OSM tag value. So for example tags.names.es contains the value of name:es (the Spanish language name), if available. I also added alt_names and official_names maps to complement the alt_name and official_name columns.

src/places.py Outdated
Comment on lines 14 to 16
Most place features are mapped as nodes, but some are mapped as areas
(typically neighborhoods, islands, etc). In these cases we include the
area's centroid in the output dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s a potential for duplicate features if the same place is mapped as both a place point and a boundary relation and both are tagged with place=*. One approach is to limit place=hamlet/village/town/city to points only and avoid mapping a place point for territories that have no logical center, such as place=state. This approach is favored in some regions like the U.S. but not yet fully implemented.

/ref shortbread-tiles/shortbread-docs#86

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that thread and the links therein are useful references.

I'm currently including place areas only if they are not tagged boundary=*, to avoid creating duplicate rows in the output. Ideally I'd do something more sophisticated (checking if the boundary relation has a label member, for example) but this is beyond what I can easily do with pyosmium.

It seems like a reasonable suggestion to just omit certain place=* elements. The main use case I have in mind for this layer is as a dataset of populated human settlements, so omitting place=country/state/province which aren't settlements in the anthropological sense would be fine. I wonder if anyone would miss place=archipelago/island/islet if they were also omitted from this layer. Personally I would not, and having at most one row in the output for each human settlement seems like it's important enough to warrant some trade-offs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

place=archipelago/island/islet sound like good candidates for a different layer about landforms. Similarly, place=ocean would be a good candidate for a water layer, and place=square for a layer about pedestrian infrastructure or perhaps public spaces.

@jake-low jake-low marked this pull request as draft September 8, 2025 22:09
@jake-low
Copy link
Member Author

jake-low commented Sep 9, 2025

I renamed the layer to settlements, and changed it to only include place=* values that represent human settlements (city, town, etc) or parts of settlements (borough, neighbourhood, etc). It now also only includes places mapped as nodes (rather than creating centroid points for places mapped as areas).

Copy link
Contributor

@ianthetechie ianthetechie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor nits. I like the way you're factoring out the map-style tags, the approach to alternate names, and the new name. Place is such an overloaded term; this captures it better I think!

Comment on lines 78 to 84
def tags_with_prefix(prefix, tags):
"""
Returns a dict of all tags with the given prefix string; keys in the
dict will have the prefix dropped.
"""
prefix_len = len(prefix)
return {k[prefix_len:]: v for (k, v) in tags if k.startswith(prefix)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this helper function! Since we'll probably need some variation of it elsewhere, maybe we should create put it in another module (helpers? sounds cliche but I'm not very creative at the moment :P) so others can call it as a top level function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, thanks! I wrote this thinking it'd be used in other layers too but forgot to move it to a helpers module.

jake-low and others added 2 commits September 17, 2025 22:48
@jake-low jake-low marked this pull request as ready for review September 18, 2025 05:56
@jake-low
Copy link
Member Author

Thank you both for the valuable feedback! I think this is ready to merge now. Definitely open to iterating on the schema further (and that goes for all of the other layers too), but by merging this we can get it built and available for people to try out, and also the helpers.py module will be available for other layers like Ian's boundaries PR.

@jake-low jake-low merged commit bd52999 into main Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants