Structured data for GLAM-Wiki/Roundtripping

From Meta, a Wikimedia project coordination wiki
Structured data for GLAM-Wiki

Data Roundtripping in a term used to describe the circulation of data from memory institutions to Wikimedia Commons and back to the institutions. Cultural heritage data is enriched in Wikimedia projects, especially Wikimedia Commons and Wikidata. This page discusses different parts of the process.

The sources of information on this page[edit]

This page is created based on the information gathered in the research project Wikimedia Commons Data Roundtripping conducted by the Swedish National Heritage Board in 2019. It is the starting point for collecting examples and case studies from the Wikimedia community. Come and add your experiences!

Types of institutional contributions[edit]

Wikimedia Commons Data Roundtripping, final report by the Swedish National Heritage Board

Institutions are increasingly recognizing the benefits of using Wikimedia projects in their workflows. Here are ways in which many institutions interact with Wikimedia projects. Feel free to extend the list and add examples.

  • Adding new items and objects from the institution
  • Correcting and extending existing data/metadata about creators, works
  • Aligning the institutional vocabularies to Wikidata
  • Adding the institutional authority IDs to Wikidata
  • Creating new items of general topics
  • Tagging institutional content with Wikidata concepts
  • Importing linked institutional content to the institution's system

Add examples![edit]

  • ...

Types of user contributions valuable to the institutions[edit]

Institutions find some user contributions in Wikimedia projects especially valuable. Please add your observations and examples as well.

  • Added references to authority IDs
  • Names and aliases
  • Translations of labels and descriptions
  • Articles and their translations
  • Imagery added to data items
  • Interacting directly with the interest groups in Wikimedia-related events
  • Collaborating with interest groups through crowdsourcing campaigns that align with the institution's activities

Add examples![edit]

  • Wiki Loves Monuments and subsequent similar photography drives are providing high impact for many reasons. The institutions can relate to a campaign model, they will benefit from the quality content produced, and often these competitions are effective in mobilizing participation. Also, they are very suitable for promoting in media and can provide a lot of media visibility. They are understandable for the general public.
  • ...

Barriers for reading user contributions to the institution[edit]

There are many hurdles to overcome before an institution can start taking advantage of data roundtripping.

They can be technical or adminstrative in nature. There may be contractual or technical limitations in collections management systems, lack of skills, labour or resources, or the institutional arrangements are such that there are blockages in realizing the idea.

There may be a lack of trust to the quality of the contributions. Conventional ways of being assured of the quality of the information do not apply in for crowdsourced information. It may be difficult to place the incoming data separate from the authority data produced by the institution, and the information about the origin of the user contributions may be lacking.

The quality of the contributions may be lacking. Translations may not have been corrected after machine translation, users may have produced rich text that cannot be handled by the collections management system, or the institutional practices (for example naming conventions) differ from the user contributions.

→ Please contribute your thoughts on this topic.

What functionalities are appreciated or should be more developed?[edit]

In Wikimedia projects[edit]

  • Enriching the media contributions by the institutions with authority IDs attached to the linked items, such as artists and artworks.
  • Translating metadata of the institutions contributions into multiple languages
  • Time-coded annotations, subtitles and translations of subtitles for audiovisual material.
  • Downloading metadata from Wikimedia Commons

Between Wikimedia projects and the local collections management system[edit]

  • Tracking changes on Wikidata and/or Wikimedia Commons including notifications.
  • The ability to use Wikidata for tagging (= as authority file).
  • Being able to contribute various aspects of the institutional collections metadata to Wikimedia projects, e.g. structured data, alternative names, dates.

In local collections management systems and workflows[edit]

  • Read user contributions to the collections management system
  • Push changes in local CMS to Wikimedia projects

Emerging workflows[edit]

Campaigns[edit]

Campaigns can be seen as structured workflows with a limited set of activities, a focused subject area and a limited time span, maybe even a geographically focused theme. It would also be possible to use a limited subset of Wikidata items for tagging, which would be an equivalent of using a controlled vocabulary familiar to an institution. This way of working is different from the open-ended collaborative work that Wikimedia projects traditionally rely on. These campaigns could happen in many forms and platforms.

Add examples![edit]

  • ...

Tracking data provenance[edit]

User contributions to institutional content may not be appreciated by the institution unless they can evaluate their validity. A fundamental requirement is to be able to display the origin of any single contribution. This is already a basic principle in Wikipedia and Wikidata, but taking advantage of source statements in Structured Data on Commons is still lacking.

Provenance is a word traditionally used to state the chronology of the ownership, custody or location of a historical object. The concept data provenance can also be used in the meaning of recording the origins of any piece of information and its previous states. Wikimedia projects have the capability to record statements originating from a number of sources and allowing to evaluate their validity with the help of the source statements. This could be useful for decolonizing unjust archival descriptions while still maintaining the historical record of the previous states. Or it could be used to track algorithmic changes to data, such as automated keywords.

Add examples![edit]

  • ...

Ecosystem of connected tools and platforms[edit]

Not all contributions will take place on Wikimedia platforms. Many tools the Wikimedia ecosystem relies on, are located outside the projects. The list is long: Magnus' tools and games, OpenRefine, Pattypan or Monumental. Entire projects can be built around displaying and interacting with Wikimedia projects, like Reasonator, Science Stories, Crotos, Inventaire.io or Wikidocumentaries.

Add other emerging workflows![edit]

  • ...

Tutorials[edit]

Wikidata Lab XXIX, a live technical training on the process of data roundtripping.

Case studies[edit]