Structured data for GLAM-Wiki/Roundtripping

Data Roundtripping in a term used to describe the circulation of data from memory institutions to Wikimedia Commons and back to the institutions. Cultural heritage data is enriched in Wikimedia projects, especially Wikimedia Commons and Wikidata. This page discusses different parts of the process.

The sources of information on this page

This page is created based on the information gathered in the research project Wikimedia Commons Data Roundtripping conducted by the Swedish National Heritage Board in 2019. It is the starting point for collecting examples and case studies from the Wikimedia community. Come and add your experiences!

Types of institutional contributions

Institutions are increasingly recognizing the benefits of using Wikimedia projects in their workflows. Here are ways in which many institutions interact with Wikimedia projects. Feel free to extend the list and add examples.

Adding new items and objects from the institution
Correcting and extending existing data/metadata about creators, works
Aligning the institutional vocabularies to Wikidata
Adding the institutional authority IDs to Wikidata
Creating new items of general topics
Tagging institutional content with Wikidata concepts
Importing linked institutional content to the institution's system

Add examples!

...

Types of user contributions valuable to the institutions

Institutions find some user contributions in Wikimedia projects especially valuable. Please add your observations and examples as well.

Added references to authority IDs
Names and aliases
Translations of labels and descriptions
Articles and their translations
Imagery added to data items
Interacting directly with the interest groups in Wikimedia-related events
Collaborating with interest groups through crowdsourcing campaigns that align with the institution's activities

Add examples!

Wiki Loves Monuments and subsequent similar photography drives are providing high impact for many reasons. The institutions can relate to a campaign model, they will benefit from the quality content produced, and often these competitions are effective in mobilizing participation. Also, they are very suitable for promoting in media and can provide a lot of media visibility. They are understandable for the general public.
...

Barriers for reading user contributions to the institution

There are many hurdles to overcome before an institution can start taking advantage of data roundtripping.

They can be technical or adminstrative in nature. There may be contractual or technical limitations in collections management systems, lack of skills, labour or resources, or the institutional arrangements are such that there are blockages in realizing the idea.

There may be a lack of trust to the quality of the contributions. Conventional ways of being assured of the quality of the information do not apply in for crowdsourced information. It may be difficult to place the incoming data separate from the authority data produced by the institution, and the information about the origin of the user contributions may be lacking.

The quality of the contributions may be lacking. Translations may not have been corrected after machine translation, users may have produced rich text that cannot be handled by the collections management system, or the institutional practices (for example naming conventions) differ from the user contributions.

→ Please contribute your thoughts on this topic.

What functionalities are appreciated or should be more developed?

In Wikimedia projects

Enriching the media contributions by the institutions with authority IDs attached to the linked items, such as artists and artworks.
Translating metadata of the institutions contributions into multiple languages
Time-coded annotations, subtitles and translations of subtitles for audiovisual material.
Downloading metadata from Wikimedia Commons

Between Wikimedia projects and the local collections management system

Tracking changes on Wikidata and/or Wikimedia Commons including notifications.
The ability to use Wikidata for tagging (= as authority file).
Being able to contribute various aspects of the institutional collections metadata to Wikimedia projects, e.g. structured data, alternative names, dates.

In local collections management systems and workflows

Read user contributions to the collections management system
Push changes in local CMS to Wikimedia projects

Emerging workflows

Campaigns

Campaigns can be seen as structured workflows with a limited set of activities, a focused subject area and a limited time span, maybe even a geographically focused theme. It would also be possible to use a limited subset of Wikidata items for tagging, which would be an equivalent of using a controlled vocabulary familiar to an institution. This way of working is different from the open-ended collaborative work that Wikimedia projects traditionally rely on. These campaigns could happen in many forms and platforms.

Add examples!

...

Tracking data provenance

User contributions to institutional content may not be appreciated by the institution unless they can evaluate their validity. A fundamental requirement is to be able to display the origin of any single contribution. This is already a basic principle in Wikipedia and Wikidata, but taking advantage of source statements in Structured Data on Commons is still lacking.

Provenance is a word traditionally used to state the chronology of the ownership, custody or location of a historical object. The concept data provenance can also be used in the meaning of recording the origins of any piece of information and its previous states. Wikimedia projects have the capability to record statements originating from a number of sources and allowing to evaluate their validity with the help of the source statements. This could be useful for decolonizing unjust archival descriptions while still maintaining the historical record of the previous states. Or it could be used to track algorithmic changes to data, such as automated keywords.

Add examples!

...

Ecosystem of connected tools and platforms

Not all contributions will take place on Wikimedia platforms. Many tools the Wikimedia ecosystem relies on, are located outside the projects. The list is long: Magnus' tools and games, OpenRefine, Pattypan or Monumental. Entire projects can be built around displaying and interacting with Wikimedia projects, like Reasonator, Science Stories, Crotos, Inventaire.io or Wikidocumentaries.

Add other emerging workflows!

...

Tutorials

Wikidata Lab XXIX, a live technical training on the process of data roundtripping.

Case studies

Wikimedia Commons Data Roundtripping final report - Swedish National Heritage Board
Structured data for GLAM-Wiki - Swedish Runestones pictures
Metropolitan Museum of Art - Data Roundtripping presentation - Wikidata Lab XXIX, May 27, 2021
Presentation of findings in the Wikimedia Commons Data Roundtripping project at Wikidata Lab XXIX
Data Roundtripping: a new frontier for GLAM-Wiki collaborations (blog post by Sandra Fauconnier, in English)