Community Wishlist Survey 2023/Larger suggestions/Allow querying the Commons tabular data with the Wikidata Query Service to better support large numerical datasets

From Meta, a Wikimedia project coordination wiki

Allow querying the Commons tabular data with the Wikidata Query Service to better support large numerical datasets

  • Problem: Wikidata are notoriously bad at storing large numerical datasets. The user interface of Wikidata and some downstream applications may currently fail on large items (items with too many statements). Therefore, some potentially useful quantitative data such as annual average temperature records or precise population data split by ethnicity cannot be currently accessed by Wikidata users. The Wikidata community maintains that large numerical datasets should instead go to tabular data files[1][2][3], CSV-like tables stored on Wikimedia Commons. There are also plenty of types of data that will never have properties on Wikidata that could be stored on Wikimedia Commons that still would be useful to be able to query about or reuse on Wikipedia. One of the reasons these tables are not so widely used is their inaccessibility to the Wikidata Query Service.
  • Proposed solution: Wikidata Query Service should be extended to be able to read from tabular data on Wikimedia Commons. It will likely require some standardization of the field names in the CSV files in Wikimedia Commons and a community discussion on that should be a part of the overall process.

Discussion

Voting