Imagico.de

blog

Verifiability and the Wikipediarization of OpenStreetMap

| 13 Comments

I have mentioned several times already that i wanted to write a blog post on verifiability in OpenStreetMap. The need for that from my perspective grew over the last 1-2 years as it became increasingly common in discussion in the OSM community that people would either flatly reject verifiability as a principle or try to weasel around it with flimsy arguments.

OpenStreetMap was founded and became successful based on the idea of collecting local geographic knowledge of the world and collecting this knowledge through the local people participating in OpenStreetMap and sharing their local geographic knowledge in a common database. The fairly anarchic form in which this is happening with a lot of freedom for the mappers how to document their local geography and very few firm rules turned out to be very successful to be able to attract people to participate and to allow representing the world wide geography in its diversity.

The one key rule that is holding all of this together is the verifiability principle. Verifiability is the most important inner rule of OpenStreetMap (as opposed to the outer rules like the legality of sources used and information entered) but it is also the most frequently misunderstood one. Verifiability is the way in which OSM ensures the cohesion of the OSM community both in present and in the future. Only through the verifiability principle we can ensure that new mappers coming to OSM will have a usable starting point for mapping their local environment no matter where they are, what they want to map and what their personal and cultural background is.

Verifiability is not to be confused with accuracy of mapping. It is not the endpoint ideal of a scale from very inaccurate to very precise. Verifiability is a criterion for the nature of the statements we record in the database. Only verifiable statements can objectively be characterized as being accurate to a certain degree.

Verifiability is also not about on-the-ground mapping vs. armchair mapping. It does not require all data entered to be actually verified locally on the ground, it just requires the practical possibility to do this. Assessing if certain information is verifiable is much more difficult when doing armchair mapping than when mapping on the ground and therefore armchair mapping is a more difficult and more demanding task regarding the abilities and competence of the mapper but you can in principle map a lot of verifiable information from images, sometimes better than when you are on the ground.

Verifiability is also not to be confused with Verificationism, which rejects non-verifiable statements as not meaningful. OpenStreetMap does not pass judgement on the value of non-verifiable data by excluding it from its scope. It just says this kind of data the project cannot include in its database because it cannot be maintained under the project’s paradigm. The viability of OpenStreetMap as a project depends on it limiting its scope to verifiable statements. And in practical application (i.e. when resolving conflicts) it is also often better to regard verifiability as meaning falsifiability.

In a nutshell verifiability means that for any statement in the OpenStreetMap database a different mapper needs to be able to objectively determine and demonstrate if the statement is true or false even without the same sources used by the original mapper. Those who based on a philosophy of universal relativism want to say that in most cases statements are neither clearly true or false have not understood the fundamental idea behind verifiability and confuse it with precision of mapping while assuming a priori that everything is verifiable. Verifiability is about the fundamental possibility of objectively assessing the truthfulness of a statement based on the observable geographic reality.

This contrasts with the Wikipedia project which takes a very different approach to recording information. Wikipedia has its own verifiability principle but this has its own meaning completely different from that of OSM. Verifiability in Wikipedia means statements need to be socially accepted to be true. This is determined based on a fairly traditional view of the reputation of sources. Such a system of reputation is obviously very culture specific so Wikipedia tries to ensure social cohesion in its community by allowing different contradicting statements and beliefs to be recorded (in particular of course in different language projects but also to some extent within a single language). Still conflicts between different beliefs and viewpoints, struggles for dominance between different political or social groups and contradicting statements in the different language versions representing different culture specific views of the world are a common occurrence and a defining element of the project.

On a very fundamental level this difference between OpenStreetMap and Wikipedia kind of mirrors the difference between natural sciences and social sciences. This however does not mean that OpenStreetMap can only record physical geography features. A huge number of cultural geography elements are empirically verifiable.

Still to many people with a social sciences or Wikipedia background the verifiability principle is very inconvenient. There is a broad desire of people to record statements in OpenStreetMap that are part of their perception of the geography even if they differ fundamentally from the perception of others and are not practically verifiable.

One of the ideas i often hear in this context is that verifiability is an old fashioned conservative relic that prevents progress – this is kind of ironic because the idea of verifiability directly stems from the values of enlightenment. Fittingly some of the specific non-verifiable mapping ideas communicated seem to have an underlying Counter-Enlightenment or Romantic philosophy.

In addition pressure to include non-verifiable data in the OpenStreetMap database also comes from people who see OpenStreetMap less as a collection of local knowledge and more as a collection of useful and suitably preprocessed cartographic data – ignoring the fact that the success of OpenStreetMap is largely due to specifically not taking this approach. The desire to include data perceived to be useful independent of its verifiability and origin is also pretty widespread in the OSM community. Such desires are usually fairly short sighted and self absorbed. Usefulness of information is by definition subjective (something is useful for a specific person in a specific situation) and relative (bow and arrow might be very useful as weapons but will likely become much less useful once you have access to a gun). An OpenStreetMap that replaces verifiability with usefulness would soon become obsolete because usefulness in contrast to verifiability is not a stable characteristic.

And what all the opponents of verifiability seem to ignore is that giving in to their desires would create huge problems for the social cohesion of the OpenStreetMap project and its ability to continue working towards its goal to create a crowd sourced database of the local knowledge of the world geography in all its diversity. The objectively observable geographic reality as the basis of all data in OpenStreetMap is the fundamental approach through which the project connects very different people from all over the world, many of whom could outside of OpenStreetMap hardly communicate with each other, to cooperate and share their local geographic knowledge. Without this as a connecting principle OpenStreetMap would not function and trying to adopt a verifiability a la Wikipedia instead would not only import all of Wikipedia’s problems, in particular the constant struggle for opinion leadership, it would also not be suitable in the end for the kind of information recorded in OpenStreetMap and the way mappers work in the project.

As already hinted above we practically already have a lot of non-verifiable data in OpenStreetMap. So far this mostly takes the form of an inner fork – there are mappers who actively map and maintain it but the vast majority of the mapper community practically ignores this. There are however also places where non-verifiable statements interfere with normal mapping in OSM – in particular by people trying to re-shape existing verifiable tags with additional non-verifiable meanings.

Non-verifiable data can broadly be split into two categories: Non-verifiable tags and non-verifiable geometries. The most widespread type of non-verifiable geometries are abstract polygon drawings. The traditional approach in OSM to map two dimensional features that verifiably exist but have no verifiable extent is to map them with a node. The node location for a feature of some localizability will usually converge to a verifiable location even if the variance of individual placements of such a node can be very high. But with the argument of practical usefulness or based on a dogmatic belief that every two dimensional entity should be mapped with a polygon in OSM quite a few mappers prefer to sketch a polygon in such cases without a verifiable basis for its geometry.

Non-verifiable bay geometry used for label drawing

Among non-verifiable tags the most widespread are non-verifiable classifications. Something like i view feature X to be of class A but i can’t really tell what A actually means in a general, abstract form so others would be able to verify my classification. One of the most widespread tags of this type is the tracktype tag which has been used since the very early days of OSM. The psychological background of this kind of tagging is usually that people want to develop a simple one-dimensional classification system for a complex multi-dimensional reality but are either not able or not willing to actually think this through into a consistent and practically verifiable definition.

importance tag on railway lines as example for non-verifiable tagging

The other type of non-verifiable tag that in particular more recently became quite popular is computable information. This means statements that can be derived from either other data in the OSM database or from outside data but that cannot be practically verified by mappers without performing the computation in question. Initiatives for adding such data are always based on the usefulness argument. And even though it is quite evident that adding such data to the OpenStreetMap database does not make much sense – both because of the verifiability principle and because of the problem of data maintenance – the practical desire to have certain computable information in the database can be very strong.

What would help to reduce this conflict in OpenStreetMap between those who value the verifiability principle and those who see this as an inconvenient obstacle to adding useful data would be to start a separate database project to record such non-verifiable add-on data for OpenStreetMap. But although this is technically quite feasible the need to build a separate volunteer community for this creates a significant hurdle. One of the motives for people pushing for non-verifiable data in OSM is to get the existing mapping community to create and maintain this data.

The ultimate question is of course if verifiability will prevail in the future of OpenStreetMap in the light of all of this? I don’t know. It depends on if the mapper community stands behind this principle or not. What i do know and i tried to explain above is that OpenStreetMap has no long term future without the verifiability principle as a practically relevant rule (i.e. one that is not only there on paper but one the community actually adheres to in mapping). So it would be essential for OpenStreetMap’s future to communicate clearly to every new mapper that they being welcome in the project is contingent to acceptance and appreciation of the verifiability principle as one of the project’s core values. I think this has been neglected during the past years and this needs to be corrected to ensure the future viability of the project.

13 Comments

  1. Ground truth rule was always explained as “what we see on the ground” then followed by “with some exceptions, like A, B, C…”.

    For those things we cannot actually verify on the ground, some OPEN documents could be as “ground truth”, it could be name of a company at address X in open official company website, municipality documents for proposed roads, national park official documents for national park boundaries, official lake registry for lake names, government documents for administration divisions and country borders etc.

    So to summarize:
    * physical objects – visual observation
    * non-physical objects – official documents/sources

    This is how it has been in OpenStreetMap for more than 10 years if not from the very beginning.

    Recent attempts to use visual observation to judge on non-physical objects is what causes misrepresentation.

    • Emphatically no.

      What you describe (which is your opinion on how things should be, not how OSM actually works) would just be a piecemeal attempt to move to verifiability a la Wikipedia. This is one of the most common weaseling strategies from people inconvenienced by the verifiability principle to attempt getting around it. What you call official is just another word for what Wikipedia calls reliable source. Subjective measures of reliability or officiality of secondary sources of geographic information have never had any meaning in OSM – no matter how much you want them to.

      • How do you suggest verifying:
        * proposed roads
        * administrative boundaries (all levels and also for land borders like between Holland and Germany – where there is nothing visible to represent the border)
        * national park boundaries

        Also, regarding QualityAssurance: If other official sources are not used, what is the chance that somebody from OpenStreetMap would notice slight changes in say national park boundaries (changes which do not move entrance signs)? Or change in city limits?

        • I deliberately did not make any statements beyond a few obvious examples as to what data currently in the OSM database is verifiable. There are cases where it clearly is and this obviously includes the vast majority of the data by volume. There are cases where it clearly isn’t (like free form label drawings) and there are cases where you have to look more closely and verifiability might also differ between different geographic settings. This is not the place to discuss specific cases and i don’t want to distract from the important matter of the verifiability principle in general with the less important question if some specific class of features in OSM is verifiable or not. We have a {{Verifiability}} template in the OSM wiki that allows indicating doubts about the verifiability of a tag and to discuss it on the talk page. When recognized early after a certain tag is invented verifiability issues of tags can often be solved with a more precise definition.

          A rule of thumb that might help in some cases: If you want to map something where you think some external authority is more significant to define the nature of this thing (in either tags or geometry) than what can be verifiably observed on the ground – either directly or indirectly – then this is most likely something that does not belong in OSM.

          • So you’re proposing to remove all non material things from OpenStreetMap. Geometrywise – national parks, boundaries, proposed roads.
            Attributewise – names of natural objects (forests, lakes, mountains), a lot of addresses and housenames.
            As by definition non material object cannot be observed with physical senses.
            Is this correct?

          • No – i have made it very clear in my post that not only physical geography elements are verifiable.

            As i said before i don’t want arguments about the verifiability of specific data in OSM to bury this important subject so i won’t get into a discussion on the verifiability of specific tags. But for most of the feature types you mentioned i can think of many concrete examples where these are verifiable. In case of addresses i think this is universally the case since an address that is non-verifiable is by definition not an address.

            It seems to me you arguing against verifiability of various things is ultimately bound to lead to the usefulness argument, i.e. that usefulness of data should trump verifiability. This is a widespread opinion in the OSM community but i explained quite in detail why i think this is not a useable guiding principle. If that is not the case and you are genuinely concerned about the verifiability of certain feature types you should take that to the appropriate channels (tagging-ML or wiki).

  2. I like your posts, ideas and work you do. I’m trying to get new ideas and learn from it. And definitely have no intention to make you angry or prove some “my” point (this is your blog, not mine after all).

    I understand your position about material (observable) objects with regards to imports, blind copying from open sources without actually checking what is on the ground. No question there.

    But I do not get the part about immaterial objects.

    (I’m not interested in any specific items, just giving examples, but abstract answer is what I am ultimately seeking, so I can skip examples).

    I’m struggling to understand how immaterial objects (their geometry and attributes) are to be physically verified. They can only leave physical TRACES of their existence (like the famous methaphor of shadows on the wall of a cave), but those traces could be non existent, scarce or even wrong (outdated, traces of other similar immaterial objects, incorrect interpretation of traces etc.). (It does not necessarily have to be OpenStreetMap objects).

    • Sure – i already mentioned addresses where the very purpose of an address is to be independently verified by those using the address to find the location in question. Opening hours of a shop would be a different example – you can verify them by checking at what times the shop is open. A bus stop, even one without a sign or shelter, can be verified by observing that a bus regularly stops at the location. There is nothing in the concept of independent verifiability that limits its application to physical objects.

      Ultimately most verifiable cultural geography features are related to human activities and can be verified by either observing these human activities themselves or physical effects of these activities.

  3. Thank you for clarification.

    You make a statement:

    “The one key rule that is holding all of this together is the verifiability principle. Verifiability is the most important inner rule of OpenStreetMap (as opposed to the outer rules like the legality of sources used and information entered) but it is also the most frequently misunderstood one.”

    And later you clarify that in your opinion “verifiability” is a physical verifiability as opposed to reliable source verifiability.

    I wonder what is a BASE of this statement, namely that:
    1. it IS an accepted practice (and understanding of verifiability) NOW and that
    2. it HAS BEEN so for a long time (or even from the very beginning of the project)?

    Verifiability wiki page does not say anything about immaterial objects.

    • Physical verifiability is your formulation – i find this unclear and ambiguous since it kind of implies we are talking about physical geography elements only. The wiki page on verifiability talks about observable characteristics and observable here means nothing more that it can be empirically be determined to be true or false based on observations in the real world. These observations however do not necessarily have to be direct perceptions of the feature itself as i explained with my previous examples.

      I don’t know the exact origins of verifiability as a codified principle – the first version of the wiki page from 2009 indicates that even back then it was not considered to be a newly created rule but simply writing down a community consensus that represents common sense. This is probably still the case today although there is a much larger fraction of mappers now that openly reject and despise verifiability. The main objective indicators that the verifiability principle currently applies to OSM are probably:

      • that the vast majority of the data is verifiable.
      • that in discussions the verifiability principle is hardly ever seriously challenged beyond the level of expression of dislike.
      • that mapping in OSM actually still works without being a total chaos.
      • that verifiability is still the only viable way of holding OSM together and working towards the goal of jointly mapping the world in a truly global and culturally diverse community based on local knowledge.
      • For me “physical verifiability” and “observations in the real world” is the same.

        Verifiability as defined in wiki page (as I see it) talks about not entering *subjective* information like: good, bad, wide, interesting, boring, smooth etc. So yes, such principle was always here and was never disputed – it is logical.

        But your post implies that *any* type of information should be verifiable *on the ground* and that *that* is an old internal OSM principle. I would like info on where this comes from? Because there are swaths of data which do not correspond to such principle, and that is old data, nor from last 2-3 years.

        • The nature of the verifiability principle has not changed in substance since it was first written down on the wiki in 2009. Its scope was always universal to everything in the OSM database. I am not proposing anything beyond this principle here. Subjective means the same as not independently verifiable – that is the very definition of subjective.

  4. I agree with you that a parallel database for this sort of info would be a good idea. As one example, Amazon Logistics keeps making edits that are designed to break their routing software (=”useful” to them), but they routinely misrepresent the ground reality when they do. To your point about the maintenance problem, Amazon surely has the resources to put this in their own database (preferably open, so their driver feedback can be responded to – it is sometimes useful).

Leave a Reply to Tomas Cancel reply

Required fields are marked *.



By submitting your comment you agree to the privacy policy and agree to the information you provide (except for the email address) to be published on this blog.