User:Yurik/Storing data

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

With advance of Graphs and Maps, we run into the problem of data storage location. While Wikidata provides a good location for "facts" (small pieces of data), we need a different place to store "blobs" - small data sets in a structured format like JSON or CSV. At the moment, blob data like a map outline is stored as raw unreadable wikitext. Instead, it should be shown as tables or images, e.g. this example. Assuming we implement it as wiki pages with "Data:" namespace, what is the best location for the community to manage it?

See also: RFC How to deal with open datasets and DataNamespace.[edit]

  • Makes sense semantically - all data in one place
  • Metadata can be stored as Wikidata items
  • Community is more technical
  • Community is more willing to experiment with new approaches
  • Wikidata is structured around concepts. Datasets are usually structured around the same data for many topics.
    • Possible solution: Use the W3C standard for an RDF representation of a Data Cube
  • People are already now having issues understanding the above and want to upload their spreadsheet to Wikidata. This would further make this really hard to explain and understand.
  • We can not have a mix of licenses which would surely be expected if we go along this path.
  • We are there to expose the data we have in a uniform way to Wikipedia, the other sister projects and third parties. This would make this impossible.
  • Wikidata is at the core a knowledge base. Not a place to put a dataset.
  • People expect to be able to query all the data in Wikidata in a uniform way. This would not be possible.
  • We are building data quality tools that all resolve around the way data is stored in Wikidata right now.

See also RFC: Data namespace blob storage on[edit]

  • Already stores other types of shared content like images, multimedia
  • Each piece of content comes with its own licensing
  • Community is well versed in legal issues
  • Content has to be public domain or freely licensed in both the source country and the U.S.
  • The aim of Wikimedia Commons is to provide a media file repository[edit]

  • High number of editors
  • More anti-vandalism bots
  • Implies it is for English language wiki


  • No legacy issues - new site means new rules and licensing
  • Hard to build a new community,, ...[edit]

  • Semantically these sites are for organizational matters, not for the actual content.