Community Wishlist Survey 2023/Larger suggestions/Allow querying the Commons tabular data with the Wikidata Query Service to better support large numerical datasets
Appearance
This proposal is a larger suggestion that is out of scope for the Community Tech team. Participants are welcome to vote on it, but please note that regardless of popularity, there is no guarantee this proposal will be implemented. Supporting the idea helps communicate its urgency to the broader movement. |
Allow querying the Commons tabular data with the Wikidata Query Service to better support large numerical datasets
- Problem: Wikidata are notoriously bad at storing large numerical datasets. The user interface of Wikidata and some downstream applications may currently fail on large items (items with too many statements). Therefore, some potentially useful quantitative data such as annual average temperature records or precise population data split by ethnicity cannot be currently accessed by Wikidata users. The Wikidata community maintains that large numerical datasets should instead go to tabular data files[1][2][3], CSV-like tables stored on Wikimedia Commons. There are also plenty of types of data that will never have properties on Wikidata that could be stored on Wikimedia Commons that still would be useful to be able to query about or reuse on Wikipedia. One of the reasons these tables are not so widely used is their inaccessibility to the Wikidata Query Service.
- ↑ https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2022/03#Pandemie_covidu-19_v_Rakousku_(Q86847911)_is_full_(no,_literally)
- ↑ https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2022/02#Population:_P1082_or_P4179?
- ↑ https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2022/11#Tabular_style_data_on_items?
- Proposed solution: Wikidata Query Service should be extended to be able to read from tabular data on Wikimedia Commons. It will likely require some standardization of the field names in the CSV files in Wikimedia Commons and a community discussion on that should be a part of the overall process.
- Who would benefit: WikiProject Tabular data, the whole Wikidata community, and, by extension, the whole world profitting from a better open data infrastructure.
- Phabricator tickets: phab:T181319
- Proposer: Vojtěch Dostál (talk) 07:54, 30 January 2023 (UTC) and ♥Ainali talkcontributions 08:03, 30 January 2023 (UTC)
Discussion
- This is likely to be too big to be in scope for Community Tech. It is a valid proposal though, so I will move to Larger Suggestions. Thanks for participating. DWalden (WMF) (talk) 13:29, 30 January 2023 (UTC)
- @DWalden (WMF): @Abbe98 was friendly and pointed me to this implementation by @Yurik that might be an inspiration for a solution to this. ♥Ainali talkcontributions 16:35, 16 February 2023 (UTC)
- @Ainali Thanks! DWalden (WMF) (talk) 12:59, 17 February 2023 (UTC)
- @DWalden (WMF): @Abbe98 was friendly and pointed me to this implementation by @Yurik that might be an inspiration for a solution to this. ♥Ainali talkcontributions 16:35, 16 February 2023 (UTC)
- Both structured data from Wikidata (via WDQS) and tabular data (from Commons) can be read in a machine readable form, so it is up to the user to use common spreadsheet software to combine both datasets (LibreOffice Calc, MS Excel, Google Docs, but also DataFrames as in Python/pandas etc.). I don't think we should impose format constraints to make both worlds "compatible", and I don't think that WDQS should be loaded with even more secondary functionality. It is useful for the rather simple stuff, but it is not the one tool to solve problems of arbitrary complexity. —MisterSynergy (talk) 21:14, 10 February 2023 (UTC)
Voting
- Support Strainu (talk) 20:19, 10 February 2023 (UTC)
- Support EpicPupper (talk) 05:38, 11 February 2023 (UTC)
- Support OwenBlacker (Talk) 14:44, 11 February 2023 (UTC)
- Support Bluerasberry (talk) 15:04, 11 February 2023 (UTC)
- Support CROIX (talk) 15:20, 11 February 2023 (UTC)
- Support Novak Watchmen (talk) 17:52, 11 February 2023 (UTC)
- Support Matěj Suchánek (talk) 18:39, 11 February 2023 (UTC)
- Support Moebeus (talk) 00:21, 13 February 2023 (UTC)
- Support Mahir256 (talk) 00:29, 13 February 2023 (UTC)
- Support Izno (talk) 07:58, 13 February 2023 (UTC)
- Support JAn Dudík (talk) 21:59, 13 February 2023 (UTC)
- Support Libcub (talk) 06:00, 14 February 2023 (UTC)
- Support ArthurPSmith (talk) 21:14, 17 February 2023 (UTC)
- Support —(ping on reply)—CX Zoom (A/अ/অ) (let's talk|contribs) 08:35, 18 February 2023 (UTC)
- Support Vulcan❯❯❯Sphere! 16:37, 18 February 2023 (UTC)
- Support Zache (talk) 05:38, 19 February 2023 (UTC)
- Support Jklamo (talk) 12:26, 19 February 2023 (UTC)
- Support β16 - (talk) 14:33, 20 February 2023 (UTC)
- Support Thingofme (talk) 02:00, 23 February 2023 (UTC)
- Support Althair (talk) 04:13, 23 February 2023 (UTC)
- Support Stevenliuyi (talk) 12:46, 23 February 2023 (UTC)