How to invent new tags in 2021

March 16, 2021 by chris | 1 Comment

Back in 2010 Jochen wrote a practical guide How to invent new tags providing some useful advise to mappers what to consider when invention new tags. Back then however the world of OpenStreetMap was very different. There were far fewer established tags than today and OpenStreetMap was still very young and most people the guidance was written for were new to OSM and to the whole concept of cartographic data collection.

Today in 2021 OpenStreetMap and the typical situation of people inventing a new tag looks very different. There are so many widely used tags that you can hardly any more invent a tag on the green field so to speak without taking into consideration already existing tags in the semantic vicinity of the tag you consider inventing. And most people who get into the position of inventing a new tag have either already been involved in OSM for quite some time and have developed a firm opinion on how things should be mapped and tagged (not always matching how things are actually mapped and tagged) or come from outside of OpenStreetMap and have a firm opinion on how cartographic data collection should look like shaped outside of OpenStreetMap.

Therefore i decided to formulate a new guidance focusing on things that i consider practically relevant when inventing new tags nowadays. This is mostly based on years of observing and studying tags and tag use as an OSM-Carto maintainer and my observations what decides on if a tag becomes a meaningful tag being adopted and consistently used by mappers and also on observing countless tagging proposals over the years. It is not meant to replace Jochen’s guidance which focuses a lot on formal and technical aspects which i will not cover here but to supplement it with updated guidance on semantic, social, cultural and geographic considerations.

Design tags to be practically verifiable

Verifiability is the core of mapping and tagging in OpenStreetMap. While we have a number of poorly verifiable legacy tags in OpenStreetMap (like tracktype) it is a fairly universal rule that newly invented tags that are not practically verifiable fail to become successful tags. They either fail to become adopted by mappers at all (like view) or worse: they fail to develop a well defined meaning in practical use despite being widely used and therefore lead to mappers wasting time to tag information that is practically useless because it is not consistent.

Practical verifiability means that a mapper needs to be realistically able to determine locally on the ground and independent of other mappers or external data sources if a tag is correct or not. In some cases tagging ideas are inherently subjective (like the usability of a road with a certain class of vehicle). Sometimes non-verifiable tags however also result from being invented as a subjective aggregate of several individually verifiable aspects which could individually be tagged in a verifiable fashion (like to some extent natural=desert).

Another distinct class of non-verifiable tags is data that is defined not through something locally observable but through authoritative information from outside of OSM. This for example applies to ratings or restaurants or other establishments. Such data simply does not belong in OpenStreetMap.

One important aspect of verifiability is the practical ability to localize features (and to delineate them when mapping with linear ways or polygons) in a verifiable fashion. Something might verifiably exist but could still not be practically mappable in OpenStreetMap because it is not practically localizable.

Do not invent tags for storing redundant or computable information

Sometimes tags are invented where mappers are supposed to tag information derived from other data in OSM or outside of OSM or that duplicates existing data in an aggregated fashion. Examples are the tagging of the length of linear features or of the prominence of peaks.

Such tags – as convenient as they might seem to data users – are a bad idea because they take valuable time of mappers to compute and aggregate information that could also be automatically computed and there is nothing to ensure that the information is kept up-to-date so it will never be reliable so data users who want to have this kind of information in good quality will always try to compute it themselves.

Be mindful of existing tags when designing new ones

This is most important when inventing new tags these days because as mentioned above there are already a lot of tags with significant use so that you will hardly be able to invent a new tag without other pre-existing tags being close-by. So how do you proceed here? First of all: Don’t rely too much on the OSM wiki for that. Not all tags with significant use are documented on the wiki and the documentation often does not accurately reflect the actual meaning of tags. Taginfo is usually very helpful to explore what tags exist and how they are used. Using overpass turbo (easiest through the links on taginfo) can be very useful to get more specific insight into how certain tags are used. Be aware that bare use numbers are often misleading due to imports and organized edits sometimes strongly distorting the view and creating the impression that a certain tag is widely used while in fact has only been used by a handful of mappers in an organized or automated fashion. The regional taginfo instances operated by Geofabrik and others can be useful here as well.

Like Jochen already pointed out more than 10 years ago it is not a good idea to invent a new tag that overlaps or re-defines an existing tag with widespread use. This almost always leads to confusion and reduction of data quality of both tags. So what should you do if the idea for a new tag you have in mind overlaps with widely used existing tags?

If your tagging idea essentially forms a subclass of an existing tag – like you want to create a tag for vegan restaurants – you should turn that into a subtag for the broader established class. A subtag or secondary tag is a tag that is to be used in combination with another primary tag and develops meaning only in this combination. In this case such a tag already exists of course, it is diet:vegan=yes|only. It can be used in combination with amenity=restaurant but also with other primary tags like amenity=cafe, amenity=fastfood or various types of shops.
If your tagging idea overlaps with an existing tag but is also meant to cover things that are not included in said tag you have three options:
- extend the definition of the existing tag to include what you want to map. This needs to be done with care to avoid devaluing the existing data this tag is used for. If existing data with that tag consistently does not include the kind of thing you want to map and what you want to map is well distinguishable from the existing class of features it might be better to
- split what you want to map into two separate classes of features – one you map with a subtag to the pre-existing tag and the other with a new primary tag.
- You can also try to introduce a new tag with the intention to replace existing tagging. This however rarely works if existing tagging is already actively used by a lot of mappers. Most attempts at doing this result in less consistent mapping because they introduce several competing tagging ideas with neither of them universally favored over the others – a development nicely illustrated here. So you need to be very careful with choosing that route. See in particular also the last point below.

Make sure your tag has a positive definition

The success of a new tag also depends on it having a clear positive definition. If you’d for example invent a new tag shop=non_food that lacks a positive definition because it is defined by what it does not sell – namely food. Such tags often turn out to be very broadly and inconsistently used and they tend to discourage the invention of more specific, positively defined tags at the same time. For example someone might tag a museum shop=non_food because it sells tickets to the museum and that is definitely non-food.

Make sure to use a fitting and non-misleading name for the tag

Try to make sure that the key and value you choose for your tag do not have either a more specific or a broader meaning in a native English speaking area. Well known examples where this turned out to be a problem are leisure=park and landuse=forest where the meaning of the tags does not encompass everything that is sometimes referred to in English with the terms park and forest.

This is hard and sometimes poses problems that are not ultimately solvable and you might have to accept an imperfect solution here. But you should try to avoid major problems by evaluating the options for choice of key and value carefully.

Formulate a language independent definition

That might sound like a contradiction because the suggestion to formulate implies use of language and at the same time it should be language independent. How is that meant to work?

The key here is to have a definition that describes the meaning of a tag without referring to culture and language specific terms for that definition. For example you should not define the meaning of natural=beach with a beach. A decent definition for example would be: A landform at the edge of sea or of a lake consisting of unvegetated loose material in grain size between sand and large boulders that is shaped by water waves. Instead of referring to the poorly defined English language term “beach” that many people will associate very different pictures with that definition uses generic terms less open to subjective interpretation (like unvegetated or water waves) to define the use of the tag. While translating the word beach into different languages will result in differences in definition due to semantic differences in the translation of the term translating the more abstract definition will be more accurate and less likely to result in a different interpretation. Being elaborate and possibly even redundant in the definition, delineating the meaning in different terms, can further help with that.

Do not define your tag by referring to Wikipedia

Some mappers try to address the previous point by referring to Wikipedia or copying text from Wikipedia. That is not a good idea. Wikipedia has a completely different perspective on documenting knowledge about the world than OpenStreetMap. Wikipedia documents the commonly accepted meaning of terms based on outside sources. That is (a) bound to change over time and (b) is often self contradicting because different parts of society have different ideas of the meanings of certain terms. We in OpenStreetMap however depend on having a single, globally consistent and locally verifiable definition of tags.

Avoid aggregating unrelated and semantically very different things into a common tag

Tags are meant to semantically structure the geographic reality as we document it in OpenStreetMap. Such a structure is most intuitive if the individual tags are semantically compact or in other words: If in the eyes of the local mapper two features with the same primary tag in the vast majority of cases have more in common than two features with different primary tags. That is not always universally possible. Tagging means classification and classification inevitably requires drawing lines between classes separating things that are semantically similar. A good example are the tags natural=scrub and natural=heath which are distinguished by different heights of scrubs. A tall natural=heath and a not very tall natural=scrub in similar ecological settings might be semantically closer than two very different areas of natural=scrub in different parts of the world. Still natural=scrub is narrow enough in its definition so mappers will intuitively see the semantic similarity of two areas correctly tagged as such no matter how different they are.

I can also formulate this advice in a different way: A newly invented tag should not need to be delineated to too many other tags. Taking again natural=scrub. That needs to and is delineated towards natural=heath and natural=wood regarding the height of the woody plants. It is also delineated towards landuse=orchard for scrubs cultivated for agricultural purposes.

Respect OpenStreetMap’s mapper-centric approach

Tagging in OpenStreetMap is intentionally designed for the ease and clarity from the side of the mapper and not for the usefulness for certain applications of data users. It is therefore not advisable to try designing new tags if you are not using these tags yourself as a mapper or if your main motivation is that you would like mappers to use these tags because of your needs as a data user. Ultimately mapper-centric tagging conventions are in the long term of benefit for both mappers and data users because they ensure high quality data – even if in the short term they sometimes mean additional work for data users to process the data into a form that they need for their application.

Be mindful about geographic diversity

OpenStreetMap in contrast to most other efforts of collecting cartographic data, like for example by public authorities, tries to produce a map of the whole planet. That means dealing with the challenge of an immense geographic diversity around the world. Obviously not every tag is applicable everywhere on the planets so we do and we need to have tags limited to certain settings in application. But it is important to not do so without necessity. Definitions of tags should be practically suitable for all geographic settings. Think for example about the distinction between landuse=farmland and landuse=orchard. That distinction should ideally not be based on a fixed list of products grown (which will inevitably always be incomplete) but on an abstract definition that allows making the distinction for any plant grown on the planet.

Be prepared for failure

This might seem like kind of a pessimistic tune to end this advice with – but it should not be read as such. Everyone – no matter how experienced and knowledgable – should be humble when inventing new tags and be ready to accept that what they suggest might fail. OpenStreetMap’s open tagging system is intentionally designed to deal with failure – with tags being introduced and being used but over time being abandoned in favor of other tagging systems or (more frequently) tags being introduced to replace existing tagging schemes to fix some issues with those but never being broadly adopted because mappers stick to existing tagging. That this happens is not a fault in OpenStreetMap, it is intended.

How tagging in OpenStreetMap looks like is a bit like the geography we map looks like – organically grown, full of flaws, but still largely functional. Yes, you could design a city from scratch on a drawing board to be perfect by some ideal of the perfect city. But it would be really hard to build it and it would be even harder to persuade people to live in such a sterile environment. And even if you solve that problem you would still over time realize that this perfect city is not as robust and adaptable as the organically grown ones. It would be over-optimized to the specific conditions taken as granted by the designer and if those conditions change (for example environmental or cultural factors or demographics) the whole design would be in peril while the organically grown city with all its flaws could organically adjust to changing conditions much better.

Imagico.de

blog