Community Wishlist Survey 2022/Multimedia and Commons/WikiCommons metadata analysis tool

From Meta, a Wikimedia project coordination wiki

WikiCommons metadata analysis tool

  • Problem: Metadata of images is constantly being improved thanks to crowdsourcing, either on the database of a GLAM or on Wikimedia Commons itself. In order to keep the metadata up to date on all systems, a tool would be needed to compare, upload and download the metadata from/to Wikimedia Commons.
  • Proposed solution: A general GLAM analysis tool that compares the metadata of the GLAM sources with metadata from other sources in Wikimedia Commons. Ideally a solution withOpenRefine or Pattypan. Procedure/Rules: Export metadata from GLAM (xls/csv); Prepare tables for the analysis toolLoad tables in analysis tool; Get/extract Wikimedia Commons Metadata; Compare metadata (i. e. highlighting the differences) and decide; No change -> ignore; Changes by GLAM -> upload the update to WikiCommons via analysis tool; Changes by WikiCommons -> create new csv file for uploading to GLAM.
  • Who would benefit: GLAM Institutions on WikiCommons
  • More comments: First draft @GLAMhack2021
  • Phabricator tickets:
  • Proposer: ETH-Bibliothek (talk) 10:32, 21 January 2022 (UTC)[reply]

Discussion

  • As Wikipedian working together with different GLAMs I strongly support this proposal. We should try to find methods, how we can make use of the improved metadata where ever they are. (To be transparent: as volunteer I also work together with the ETH-library.) --Hadi (talk) 16:51, 21 January 2022 (UTC)[reply]
  • Point to Structured Data on Commons: a good addition and a great idea! what i really would appreciate, if such a tool would also point (more) metadata alongside Structured Data on Commons. because structured data contains so much additional knowledge, which could be helpful for updating or reconciling with local data and generally my opinion is that structured data is the bright and sustainable future of metadata in commons. :-) --Mfchris84 (talk) 11:57, 21 January 2022 (UTC)[reply]
  • Sounds similar to Wikimedia Commons Data Roundtripping and Structured data for GLAM-Wiki/Roundtripping. Jean-Fred (talk) 12:03, 31 January 2022 (UTC)[reply]
  • What you're describing is multiple copies of the same data being kept in different places and getting out of sync, i.e. data redundancy (the bad kind). A real solution to this problem is to not have the redundancy in the fist place. And if that's not possible for some reason, the process should be much more streamlined than you describe; the server should be able to pull metadata from an alternate source and display it on the page with an easy UI to select the changes to apply. As for exporting, XLS is completely unnecessary since it's a proprietary, convoluted format; CSV offers a minimum of structure, however I would hope systems out there would be capable of reading a semantic structured data export (since Commons already supports structured data). Silver hr (talk) 16:48, 2 February 2022 (UTC)[reply]

Voting