Wikidata/Notes/URI scheme

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

Wikidata will offer data for numerous items. Items are identified by an URI. This note describes the current thoughts on the URI scheme to be used in Wikidata.

Background[edit]

Wikipedia today[edit]

Let us look first at the URL scheme of Wikipedia:

  1. http://en.wikipedia.org/wiki/Germany is the URL for the article on Germany in the language en, English
  2. http://de.wikipedia.org/wiki/Deutschland is the URL for the article on Germany in the language de, German
  3. http://www.wikidata.org/wiki/Q183 (Wikidata-object "Germany") has entries for both articles. The articles use those entries to provide links to other languages.

Note that the URLs for pages in Wikipedia are not persistent. But still, they can change meaning: if a Kevin Smith should become US president some day, he will most certainly replace the author Kevin Smith from his place as the main topic of the Kevin Smith article. Also changes in a name - e.g. when a person marries or becomes pope - lead to changes of the URL. There will be adequate redirects and disambiguations in most cases, but although these are easy to follow and disambiguate for humans, this is not necessarily true for machines.

How DBpedia does it[edit]

For reference, here is how DBpedia solves the issue:

  1. http://dbpedia.org/resource/Germany is the URI for the item Germany
  2. http://dbpedia.org/page/Germany is the HTML representation for the description about Germany
  3. http://dbpedia.org/data/Germany.json offers the machine readable data about Germany, in a number of different formats, like RDF, JSON, Turtle, etc. Note that the format is specified by the suffix, i.e. .json, .rdf, .ttl, .ntriples, etc.

The name of the item is equivalent to the name of the article in the English Wikipedia. An effort to provide internationalized URIs in DBpedia is underway, by basically creating a multitude of language-specific URIs. There is a dedicated note with more details on the relationship between DBpedia and Wikidata.

Issues for consideration[edit]

  1. URIs should uniquely identify an item
  2. URIs should be persistent
  3. URIs should be canonical within Wikimedia projects
  4. URIs in Wikimedia projects should not be based on any one language, e.g. English
  5. URIs should not break caching
  6. URIs should be usable with the interwiki link system
  7. URIs should be easy to use

The list is sorted by importance.

A solution that is solely based on a label -- and maybe an English label at that -- is highly problematic. Why should the canonical URI for Rome be http://www.wikidata.org/entity/Rome ? Why not http://www.wikidata.org/entity/Roma ? And if it is the latter, what would the URI be for the Roma people? How to deal with disambiguation? But how do you disambiguate without using a language again? What about if a label changes meaning? etc. All these problems disappear once you use a unique, but inherently meaningless identifier. The disadvantages are in their usability: they are not easily written, they cannot easily be understood, they cannot easily be remembered. Tools could help with these problems, and it is hoped that libraries helping to solve these issues will be offered for developers who want to integrate Wikidata in their applications.

Full proposal for Wikidata (planned)[edit]

The following gives a proposal for the URL scheme for Wikidata. This is not implemented yet.

  1. the actual wiki is hosted at the (pretty) URL http://www.wikidata.org/wiki/
  2. Two forms of Wikipedia-like forms resolve appropriately to the Wikidata item the linked Wikipedia article is about:

Additional convenience forms are described below.

Planned implementation[edit]

is to be rewritten to

which loads

Machine-readable access (planned)[edit]

Additionally, the following URIs exist for the more machine-oriented access:

  1. http://www.wikidata.org/entity/Q{id} is the persistent URI of the item identified as Q{id}
  2. http://www.wikidata.org/wiki/Special:EntityData/Q{id} provides the data about the item

Current implementation[edit]

Here is how the URLs on Wikidata would look like: