Wikidata/Notes/URI scheme

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

Wikidata will offer data for numerous items. Items are identified by an URI. This note describes the current thoughts on the URI scheme to be used in Wikidata.


Wikipedia today[edit]

Let us look first at the URL scheme of Wikipedia:

  1. is the URL for the article on Germany in the language en, English
  2. is the URL for the article on Germany in the language de, German
  3. (Wikidata-object "Germany") has entries for both articles. The articles use those entries to provide links to other languages.

Note that the URLs for pages in Wikipedia are not persistent. But still, they can change meaning: if a Kevin Smith should become US president some day, he will most certainly replace the author Kevin Smith from his place as the main topic of the Kevin Smith article. Also changes in a name - e.g. when a person marries or becomes pope - lead to changes of the URL. There will be adequate redirects and disambiguations in most cases, but although these are easy to follow and disambiguate for humans, this is not necessarily true for machines.

How DBpedia does it[edit]

For reference, here is how DBpedia solves the issue:

  1. is the URI for the item Germany
  2. is the HTML representation for the description about Germany
  3. offers the machine readable data about Germany, in a number of different formats, like RDF, JSON, Turtle, etc. Note that the format is specified by the suffix, i.e. .json, .rdf, .ttl, .ntriples, etc.

The name of the item is equivalent to the name of the article in the English Wikipedia. An effort to provide internationalized URIs in DBpedia is underway, by basically creating a multitude of language-specific URIs. There is a dedicated note with more details on the relationship between DBpedia and Wikidata.

Issues for consideration[edit]

  1. URIs should uniquely identify an item
  2. URIs should be persistent
  3. URIs should be canonical within Wikimedia projects
  4. URIs in Wikimedia projects should not be based on any one language, e.g. English
  5. URIs should not break caching
  6. URIs should be usable with the interwiki link system
  7. URIs should be easy to use

The list is sorted by importance.

A solution that is solely based on a label -- and maybe an English label at that -- is highly problematic. Why should the canonical URI for Rome be ? Why not ? And if it is the latter, what would the URI be for the Roma people? How to deal with disambiguation? But how do you disambiguate without using a language again? What about if a label changes meaning? etc. All these problems disappear once you use a unique, but inherently meaningless identifier. The disadvantages are in their usability: they are not easily written, they cannot easily be understood, they cannot easily be remembered. Tools could help with these problems, and it is hoped that libraries helping to solve these issues will be offered for developers who want to integrate Wikidata in their applications.

Full proposal for Wikidata (planned)[edit]

The following gives a proposal for the URL scheme for Wikidata. This is not implemented yet.

  1. the actual wiki is hosted at the (pretty) URL
    • So a user page might be found at{username}
    • A normal item will be at{id}
  2. Two forms of Wikipedia-like forms resolve appropriately to the Wikidata item the linked Wikipedia article is about:
    • http://{shortsite}{title} and
    • This means that other wikis (like Wikipedia) need two interwiki prefixes to be able to link to items by ID as well as using a Wikipedia page title. See Wikidata/Notes/Wiki_links for that.

Additional convenience forms are described below.

Planned implementation[edit]

is to be rewritten to

which loads

Machine-readable access (planned)[edit]

Additionally, the following URIs exist for the more machine-oriented access:

  1.{id} is the persistent URI of the item identified as Q{id}
    • Redirects (303) to the appropriate URL depending on the request header
    • Used for linked data as the canonical URI
    • (internal:{id} rewrites to{id} which then does connect and redirects 303 to the correct page)
    • Conceptually:{id} identifies the Item, while{id} identifies the HTML page about the item.
    • Conceptually:{id} identifies the item, while{id}<nowiki> identifies the data about the item #{id} provides the data about the item #* This output can be specified using suffixes and query parameters, e.g. the format (JSON, RDF, ...), the version, the type of data (including references or not?), language data, etc. #* so the JSON data would be at <nowiki>{id}.json and the RDF/XML at{id}.rdf etc.
    • those are also the goals of the 303 redirects mentioned above

Current implementation[edit]

Here is how the URLs on Wikidata would look like:

Example with the concept "Physics" :

Wikidata URI Schema.PNG