Wikidata/Notes/Schema.org and Wikidata

From Meta, a Wikimedia project coordination wiki

This note describes a possible relationship between Schema.org and Wikidata.

What is Schema.org[edit]

Schema.org is a project to improve general Web page markup through the use of structured data. It provides ~600 term initial vocabulary, and uses an entity-relationship (RDF) approach. Web markup is annotated in Microdata or RDFa, broadly in the style popularised by the Microformats community (although with different notations and community process). Schema.org is a collaboration initiated by several major search engines, but takes wider input into the schema via a W3C-hosted discussion forum.

Relation[edit]

These projects share a concern for improving the treatment of structured data in the popular, mainstream Web. Wikidata, building on Wikipedia, has relatively centralised data but a decentralised descriptive schema built around the edits of thousands of users. By contrast, schema.org has very decentralised data (potentially billions of pages), but has relatively greater central control of its schema. There are natural limits to how much a single centralised vocabulary can handle all the descriptive tasks people might ask of it, and there are limits to the extent to which Wikipedia can on its own provide structured data describing everything of interest to its users. The two projects are therefore natural partners. By defining ways for Wikipedia’s huge dataset to be used within schema.org descriptions, we could reduce the pressure for schema.org to include large lists of things, or have comprehensive type lists for all topics. And by bridging Wikipedia’s data structures to data published elsewhere in the Web, we can show techniques that allow different parties to contribute data to the Wikipedia ecosystem without necessarily copying everything into the Wikidata database. This echoes the debate around deletionism/inclusionism in the wider Wikipedia community. If other sites (perhaps MediaWiki-backed) also publish structured data using schema.org + Wikidata markup, it may be possible to show some richer linking of information across sites.

What might this mean in practice?[edit]

Schema.org already has many classes for local businesses and services; various kinds of ‘FoodEstablishment’, ‘GovernmentOrganization‘. These don’t exhaust the possibilities. Schema.org aims to remain a central documentation hub for both search engines and publishers, showing simple practical markup for structured data. But schema.org is not the best place to manage lists of kinds of food or government establishment. It is a priority for schema.org to show how to integrate such external data - eg. country codes, categories etc. Wikidata, as it evolves, is a natural source of such content. If we define integration points, it should be possible for the Wikipedia community’s work to make richer schema.org descriptions possible. Meanwhile, sites that provide structured data using schema.org markup can provide data that helps grow Wikidata’s own descriptive databases.