Jump to content

Wikidata/Notes/URI scheme

From Meta, a Wikimedia project coordination wiki

Wikidata will offer data for numerous items. Items are identified by an URI. This note describes the current thoughts (mostly circa 2012–2013) on the URI scheme to be used in Wikidata.


Wikipedia today[edit]

Let us look first at the URL scheme of Wikipedia:

  1. http://en.wikipedia.org/wiki/Germany is the URL for the article on Germany in the language en, English
  2. http://de.wikipedia.org/wiki/Deutschland is the URL for the article on Germany in the language de, German
  3. http://www.wikidata.org/wiki/Q183 (Wikidata-object "Germany") has entries for both articles. The articles use those entries to provide links to other languages.

Note that the URLs for pages in Wikipedia are not persistent. But still, they can change meaning: if a Kevin Smith should become US president some day, he will most certainly replace the author Kevin Smith from his place as the main topic of the Kevin Smith article. Also changes in a name - e.g. when a person marries or becomes pope - lead to changes of the URL. There will be adequate redirects and disambiguations in most cases, but although these are easy to follow and disambiguate for humans, this is not necessarily true for machines.

How DBpedia does it[edit]

For reference, here is how DBpedia solves the issue:

  1. http://dbpedia.org/resource/Germany is the URI for the item Germany
  2. http://dbpedia.org/page/Germany is the HTML representation for the description about Germany
  3. http://dbpedia.org/data/Germany.json offers the machine readable data about Germany, in a number of different formats, like RDF, JSON, Turtle, etc. Note that the format is specified by the suffix, i.e. .json, .rdf, .ttl, .ntriples, etc.

The name of the item is equivalent to the name of the article in the English Wikipedia. An effort to provide internationalized URIs in DBpedia is underway, by basically creating a multitude of language-specific URIs. There is a dedicated note with more details on the relationship between DBpedia and Wikidata.

Issues for consideration[edit]

  1. URIs should uniquely identify an item
  2. URIs should be persistent
  3. URIs should be canonical within Wikimedia projects
  4. URIs in Wikimedia projects should not be based on any one language, e.g. English
  5. URIs should not break caching
  6. URIs should be usable with the interwiki link system
  7. URIs should be easy to use

The list is sorted by importance.

A solution that is solely based on a label -- and maybe an English label at that -- is highly problematic. Why should the canonical URI for Rome be http://www.wikidata.org/entity/Rome ? Why not http://www.wikidata.org/entity/Roma ? And if it is the latter, what would the URI be for the Roma people? How to deal with disambiguation? But how do you disambiguate without using a language again? What about if a label changes meaning? etc. All these problems disappear once you use a unique, but inherently meaningless identifier. The disadvantages are in their usability: they are not easily written, they cannot easily be understood, they cannot easily be remembered. Tools could help with these problems, and it is hoped that libraries helping to solve these issues will be offered for developers who want to integrate Wikidata in their applications.

Full proposal for Wikidata (planned)[edit]

The following gives a proposal for the URL scheme for Wikidata. This is not implemented yet.

  1. the actual wiki is hosted at the (pretty) URL http://www.wikidata.org/wiki/
    • So a user page might be found at http://www.wikidata.org/wiki/User:{username}
    • A normal item will be at http://www.wikidata.org/wiki/Q{id}
  2. Two forms of Wikipedia-like forms resolve appropriately to the Wikidata item the linked Wikipedia article is about:
    • http://{shortsite}.wikidata.org/wiki/{title} and
    • http://www.wikidata.org/wiki/Special:ItemByTitle/{site}/{title}
    • This means that other wikis (like Wikipedia) need two interwiki prefixes to be able to link to items by ID as well as using a Wikipedia page title. See Wikidata/Notes/Wiki_links for that.

Additional convenience forms are described below.

Planned implementation[edit]

is to be rewritten to

which loads

Machine-readable access (planned)[edit]

Additionally, the following URIs exist for the more machine-oriented access:

  1. http://www.wikidata.org/entity/Q{id} is the persistent URI of the item identified as Q{id}
    • Redirects (303) to the appropriate URL depending on the request header
    • Used for linked data as the canonical URI
    • (internal: http://www.wikidata.org/entity/Q{id} rewrites to http://www.wikidata.org/wiki/Special:EntityData/Q{id} which then does connect and redirects 303 to the correct page)
    • Conceptually: http://www.wikidata.org/entity/Q{id} identifies the Item, while http://www.wikidata.org/wiki/Q{id} identifies the HTML page about the item.
    • Conceptually: http://www.wikidata.org/entity/Q{id} identifies the item, while http://www.wikidata.org/wiki/Special:EntityData/Q{id} identifies the data about the item
  2. http://www.wikidata.org/wiki/Special:EntityData/Q{id} provides the data about the item
    • This output can be specified using suffixes and query parameters, e.g. the format (JSON, RDF, ...), the version, the type of data (including references or not?), language data, etc.
    • so the JSON data would be at http://www.wikidata.org/wiki/Special:EntityData/Q{id}.json and the RDF/XML at http://www.wikidata.org/wiki/Special:EntityData/Q{id}.rdf etc.
    • those are also the goals of the 303 redirects mentioned above

Current implementation[edit]

Here is how the URLs on Wikidata would look like:

Example with the concept "Physics" :