Wikidata/Notes/DBpedia and Wikidata

From Meta, a Wikimedia project coordination wiki

DBpedia is a great and active project dealing with structured data and Wikipedia. Whereas on the first glance DBpedia and Wikidata may look like they have a lot of overlap, they actually do not: they fulfill very different tasks, and there is a small overlap where we need to figure out together how to best co-evolve.

DBpedia, among many other things, extracts structured data from the infoboxes in Wikipedia, and publishes them in RDF and a few other formats. But it also hosts a community effort to define extractors for the data, that can be used well beyond Wikipedia. It provides a number of services around the extracted data, like DBpedia mobile, a SPARQL endpoint, a faceted browser, a number of mappings to external ontologies, an ontology itself, etc. A lot of research is being done on DBpedia.

Wikidata on the other hand will provide a secondary and tertiary database of structured data that everyone can edit. It turns the extraction process of DBpedia on its head: instead of extracting structured data from infoboxes, it will allow infoboxes to be created from structured data. This means that the effort DBpedia puts into extracting the data can be reduced, and the folks in the project can concentrate on the higher-value services and processes, like the browsers, the SPARQL endpoint, discovering mapping and quality issues in the data, etc. DBpedia did and will go well beyond the goals of Wikidata.

Wikidata does not make DBpedia obsolete, nor does it aim to do that. Since Wikidata will start from scratch, for a while DBpedia and Wikidata will complement each other. Wikidata is expected to not use automatically scraped data. DBpedia, for its other services, may use Wikidata data and integrate it.

Overlap between the two projects[edit]

  • Both projects publish URIs for entities based on Wikipedia. We will go into detail on this topic in the next section.
  • Both projects publish RDF data about entities. The source of the data is very different: whereas DBpedia extracts the data from the infoboxes, Wikidata will collect data entered through its interfaces. Data in Wikidata will also be annotated with its provenance: it does not simply state the population of Germany, but it also requires a source to be given for the data. The two data repositories will co-exist. If Wikidata gets established and collects an interesting amount of data, the relationship between the two datasets should be further explored.

URI schemes[edit]

Another essay describes the possible URI scheme for Wikidata.

Whereas the Semantic Web standards are built to be able to deal with a heterogeneous set of URIs for one and the same entity, and mapping these URIs with the sameAs property, in reality this leads to an overhead which hampers the development of tools considerably. For many entities it is often considered useful to work with canonical URIs. In the last few years, a best practice of using DBpedia URIs for common entities -- like cities, countries, persons -- has developed. They have the advantage that you can often guess them and look them up in Wikipedia, as they are based on the title of the English Wikipedia article. So one can simply write them down as dbpedia:Germany.

Based on the above proposal, Wikidata would also provide such language specific URIs, i.e. en:Germany (even though they would not be the canonical URIs). The DBpedia and Wikidata language-specific URIs can be programmatically linked to each other by simply replacing the namespace. Furthermore it is expected that Wikidata will also sameAs to the DBpedia URIs, and hopefully the other way around.

Potential Contributions from DBpedia[edit]

The DBpedia team is excited that the Wikipedia community is moving towards recognizing the importance of handling structured data within Wikipedia and we wish the Wikidata project to be a great success. If the Wikidata project sees fit, the project is of course highly invited to reuse whatever part of DBpedia that you think is useful. Potential contributions from DBpedia could for instance be:

  • the reuse DBpedia source code where ever the Wikidata project sees fit (things don't need to be implemented twice).
    • Wikidata will survey the DBpedia source code and keep an eye on reusable components.
  • DBpedia data could be used to bootstrap the Wikidata repository with initial content (the DBpedia team would be happy to help, if this is desired by Wikidata).
    • The Wikidata team does not decide on the content of the Wikidata site. I am sure the community, once it is established, will have a discussion about this topic.
  • any other help the Wikidata project is interested in, just send a mail to dbpedia-discussion or dbpedia-developers.
    • Thank you! We will.

Good reads on the topic[edit]

  1. Wikidata through the Eyes of DBpedia
  2. DBpedia interlinking, Query improvement, Wikidata