Community Wishlist Survey 2023/Multimedia and Commons/Allow access to SDC from other wikis

From Meta, a Wikimedia project coordination wiki

Allow access to SDC from other wikis

  • Problem: It is not possible to access Structured Data on Commons data via Lua modules and parser functions from other wikis
  • Proposed solution: Enable cross-wiki arbitrary access to MediaInfo entity data from wikis other than Commons, in the same way that Wikidata data can be accessed.
  • Who would benefit: Editors and readers of all Wikimedia projects, since use of Commons media is universal
  • More comments: Enabling this access is necessary to fully realize the value of Structured Data on Commons. We need to allow the growing body of data on Structured Data Commons (SDC) to be accessed and used outside of Commons — just like the images themselves are — to make this growing body of data completely useful. This would allow templates to be coded using data derived from SDC statements in the same way infoboxes and many other templates rely on Wikidata. There are millions of SDC file captions, and hundreds of millions of SDC statements currently. Arbitrary access to SDC has long been assumed, such as in "What are the benefits of captions?"in the Commons help page, but never implemented.

    Example uses:

    1. Populate an image's caption with the SDC caption that is stored in the MediaInfo entity. File captions on Wikimedia Commons are encouraged prominently in the UI, including in the Upload Wizard, but not currently used for much across the wikis. This is what file captions do, but only within Wikimedia Commons. This could allow for displaying captions in the user's own language in multilingual projects, as long as the captions are added centrally on Wikimedia Commons for each language.
    2. Populate alt text for an image using the alt text property's value, if one is present, or even by listing all of the things depicted in the image with P180 (depicts).
    3. Use descriptive metadata fields like title, creator, data, and collection to make formatted citation for images that are historical artifacts from online catalogs, such as suggested in the "Image credits in Wikipedia: Can we do better?" Wikimania talk in 2022.
  • Phabricator tickets: task T238798
  • Proposer: Dominic (talk) 21:37, 29 January 2023 (UTC)[reply]

Discussion

  • A somewhat different proposal: T325949 --Tgr (talk) 02:18, 1 February 2023 (UTC)[reply]
    In somewhat more detail: I think there should be some sort of intermediary format between SDC and the tools using the data because:
    1. Often we want to be able to reuse those tools for local (non-Commons) images, but those aren't going to have SDC support for the forseeable future. OTOH if we had an exchange format which instead of the full richness of SDC Wikibase claims only supports the specific metadata types for which there's a use case for displaying them outside the file description page, coming up with a different way of providing them on non-SDC file pages (e.g. parser functions) would be easy.
    2. Similarly, in many cases we want the tools to be usable outside Wikimedia wikis, on third party MediaWiki installations (which typically don't use Wikibase which is a pretty unwieldy extension).
    3. It could also be used for cross-wiki access of non-SDC information (like EXIF data).
    4. If the specific PIDs and whatnot get encoded into a zillion tools and Lua modules, any kind of changes to SDC data (e.g. switching from monolingual text to multilingual text once that data type gets implemented in Wikibase) gets extremely hard to coordinate. In contrast, an exchange format would make it easy to provide backwards compatibility for however long it is deemed useful.
    5. For a developer usability point of view, some kind of simple key-value metadata format is much easier to understand and use than SDC with its many levels of complexity (properties, Wikidata items as values that need to be processed further, qualifiers, special values like partial dates etc). When special handling is required, having to implement it in every tool separately would be a lot of wasted effort and result in lots of sub-par functionality.
    6. Also, from a maintenance point of view, we don't want lots of different media-related tools and extensions to directly depend on Wikibase, which is hard to set up and has many fragile tests.
    7. Cache invalidation tends to be the hard part of cross-wiki data access, and that might be easier and more efficient for a data exchange format where you know what the use case for the various fields is than for raw SDC data.
    So IMO the way to go here is:
    • Define an exchange format, on a general level something simple like key => JSON value, with each key in use and the corresponding data semantics documented somewhere. (This wouldn't be completely dissimilar to what we have now with the GetExtendedMetadata hook, just more powerful.)
    • Define a hook or service system whereby SDC (WikibaseMediaInfo) can fill in and invalidate these values as they get queried and changed.
    • Add MediaWiki core functionality for caching structured media metadata in this data exchange format, accessing it cross-wiki via FileRepo, and exposing it via web API, Lua API and PHP API.
    Tgr (talk) 02:41, 5 February 2023 (UTC)[reply]
  • I see only gains, no pains! Beireke1 (talk) 08:50, 1 February 2023 (UTC)[reply]
  • In the project Wiki Loves Living Heritage, I would like to display a set of images depicting each heritage element, as in this manually created example. Dominic and their team has created ViewIt! that produces a json of images matching a given topic, using many different methods. However, there is no way I can use this result on those pages. This proposed solution would make that possible. – Susanna Ånäs (Susannaanas) 🦜 14:28, 11 February 2023 (UTC)[reply]
  • We are not realizing the benefits of all the hard work on SDC - we are doing so much to input and ingest, yet we are finding it extremely hard to read and derive the benefits of it because of the authentication issues. Let's open things up and let great things happen. - Fuzheado (talk) 11:10, 14 February 2023 (UTC)[reply]

Voting