Wikidata/Notes/URI scheme

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

Wikidata will offer data for numerous items. Items are identified by an URI. This note describes the current thoughts on the URI scheme to be used in Wikidata.

Contents

[edit] Wikipedia today

Let us look first at the URL scheme of Wikipedia:

  1. http://en.wikipedia.org/wiki/Germany is the URL for the article on Germany in the language en, English
  2. The language link between http://en.wikipedia.org/wiki/Germany and http://de.wikipedia.org/wiki/Deutschland is given explicitly on both sides

Note that the URLs for pages in Wikipedia are not persistent (although they are quite persistent, see e.g. Siorpaes/Hepp research on this issue). But still, they can change meaning: if a Kevin Smith should become US president some day, he will most certainly replace the author Kevin Smith from his place as the main topic of the Kevin Smith article. Also changes in a name - e.g. when a person marries or becomes pope - lead to changes of the URL. There will be adequate redirects and disambiguations in most cases, but although these are easy to follow and disambiguate for humans, this is not necessarily true for machines.

[edit] How DBpedia does it

For reference, here is how DBpedia solves the issue:

  1. http://dbpedia.org/resource/Germany is the URI for the item Germany
  2. http://dbpedia.org/page/Germany is the HTML representation for the description about Germany
  3. http://dbpedia.org/data/Germany.json offers the machine readable data about Germany, in a number of different formats, like RDF, JSON, Turtle, etc. Note that the format is specified by the suffix, i.e. .json, .rdf, .ttl, .ntriples, etc.

The name of the item is equivalent to the name of the article in the English Wikipedia. An effort to provide internationalized URIs in DBpedia is underway, by basically creating a multitude of language-specific URIs. There is a dedicated note with more details on the relationship between DBpedia and Wikidata.

[edit] Issues for consideration

  1. URIs should uniquely identify an item
  2. URIs should be persistent
  3. URIs should be canonical within Wikimedia projects
  4. URIs in Wikimedia projects should not be based on any one language, e.g. English
  5. URIs should not break caching
  6. URIs should be easy to use

The list is sorted by importance.

A solution that is solely based on a label -- and maybe an English label at that -- is highly problematic. Why should the canonical URI for Rome be http://wikidata.org/id/Rome ? Why not http://wikidata.org/id/Roma ? And if it is the latter, what would the URI be for the Roma people? How to deal with disambiguation? But how do you disambiguate without using a language again? What about if a label changes meaning? etc. All these problems disappear once you use a unique, but inherently meaningless identifier. The disadvantages are in their usability: they are not easily written, they cannot easily be understood, they cannot easily be remembered. Tools could help with these problems, and it is hoped that libraries helping to solve these issues will be offered for developers who want to integrate Wikidata in their applications.

[edit] Proposal for Wikidata

The following gives a proposal for the URL scheme for Wikidata:

  1. http://{site}.wikidata.org/wiki/{Title} resolves to the Wikidata page of the item the {site} Wikipedia article of {Title} is about
  2. http://{language}.wikidata.org/label/{Label} resolves to the Wikidata page of the item called {Label} in the language {language}
    • If there are several items with the same label a disambiguation page will be displayed
    • The language of the interface is the selected language
    • The languages are more numerous than the sites in the previous URL
    • Note that the set of {site}s is a proper subset of the set of {language}s (true?)
  3. http://{language}.wikidata.org/id/Q{id} is the page about the item with the ID Q{id}
    • The language of the interface is the selected language
    • id is an integer

Additionally, the following URIs exist for the more machine-oriented access:

  1. http://wikidata.org/id/Q{id} is the persistent URI of the item identified as Q{id}
    • Resolves to the appropriate URL depending on the request header
    • Used for linked data as the canonical URI
  2. http://{site}.wikidata.org/item/{Title} is a semi-persistent convenience URI for the item about the article Title on the selected site
    • Semi-persistent refers to the fact that Wikipedia titles can change over time, although this happens rarely
  3. http://wikidata.org/data/Q{id} provides the data about the item
    • This output can be configured, e.g. the format (JSON, RDF, ...), the version, the type of data (including references or not?), language data, etc.
  4. http://{site}.wikidata.org/data/{Title} a convenience URL for data on the item the article Title on the selected site is about
    • Should also default-set the language for the data export to the language of the site

[edit] Implementation

The above URLs would be resolved through Apache to something like this:

  1. http://{site}.wikidata.org/wiki/{Title} resolves to /w/index.php?title=Special:SiteTitle&site={site}&title={Title}&uselang={site} - not to /w/index.php?title={Title}
  2. http://{language}.wikidata.org/label/{Label} resolves to /w/index.php?title:Special:LanguageLabel&language={language}&label={Label}&uselang={language}
  3. http://{language}.wikidata.org/id/Q{id} resolves to /w/index.php?title=Q{id}&uselang={language}

The second set:

  1. http://wikidata.org/id/Q{id} resolves to /w/index.php?title=Special:ResolveURI&id=Q{id} which in turn resolves to different things, depending on the request header
  2. http://{site}.wikidata.org/item/{Title} resolves to /w/index.php?title=Special:ResolveURI&site={site}&sitetitle={title} which looks up the id and then resolves as the previous URL
  3. http://{site}.wikidata.org/data/{Title} resolves to /w/api.php?action=wbgetitems&sites={site}&titles={title}
Personal tools

Variants
Actions
Navigation
Community
Beyond the Web
Print/export
Toolbox