Research:Supporting Commons contribution by GLAM institutions

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
22:59, 5 July 2017 (UTC)
Duration:  2017-July — 2017-October

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.

Wikimedia Commons is the world’s largest free-licensed media repository, with over 40 million image, audio, and video files. Commons media are contributed by anyone, and curated by volunteers. The MediaWiki software platform that Commons is built on was designed to host text, not rich media. The lack of consistent structures for capturing the metadata associated with media files on Commons impedes uploading, search, curation, re-use, and tracking. This creates challenges for everyone who uses Wikimedia Commons: media contributors, curators, and those who use media hosted on Commons—on Wikimedia projects and beyond.

The Wikimedia Foundation has launched a multi-year program to build a structured data layer into Commons, based on the WikiBase platform, that will allow file metadata to be captured in a standard, machine-readable format and make it easier to associate Commons media with WikiData items.

This page describes one of the research components of that program, which seeks to understand the impact of structured data integration on one set of important Commons stakeholders: GLAM institutions.


GLAM (“Galleries, Libraries, Archives, and Museums”) is an established Wikimedia program that seeks to elicit contributions of free-licensed content from public and private institutions from around the world that curate cultural knowledge resources. One particularly valuable form of contribution that GLAM institutions make to Wikimedia Movement is by uploading media from their collections to Wikimedia Commons. Media include historical photography, video, and audio recordings, as well as images of artifacts, artwork, and other cultural goods.

GLAM institutions have communicated that the current tools available for uploading media (often in batches of dozens or hundreds of files) are not well suited for their needs. For example, these media often have associated metadata—e.g. date of production, authorship, institutional source, and license information—that may be lost or mangled during the upload process. Monitoring is also a challenge: GLAM institutions often have a desire or an institutional mandate to track the usage of the files they donate. The lack of structure within the Commons repository and the lack of tools for tracking the usage of GLAM content within Commons make this difficult.

Why focus on GLAM uploads?[edit]

Obviously, many people who are not affiliated with GLAM institutions or working on GLAM content donation projects also upload content to Wikimedia Commons. GLAM projects are very diverse in terms of their goals, the amount of media content they upload, the types of media content and metadata they have, their level of familiarity with Wikimedia projects generally and Commons in particular, and the kinds of support resources they have available (e.g. staff time, volunteer time, technical resources). By understanding the unmet needs and current workflows of GLAM media contributors, we can get a better sense of the challenges that others who upload media to Commons may face.

Next steps[edit]

In subsequent phases of this research project, we will focus our investigation on another key stakeholder group for the Commons structured data program: volunteers who curate media on Wikimedia Commons. This research follows up on existing research done by Wikimedia Deutschland of heavy Commons users.


This project research uses semi-structured interviews and surveys of members of GLAM institutions and Wikimedia community members involved in the GLAM program to understand goals, motivations, and current workflows related to contribution and curation of GLAM content. Interview and survey data will inform the development of secondary research artifacts (personas, scenarios, and user requirements) that will inform the decision-making of the Structured Data Commons program, as well as the design of tools and features to support the contribution and curation of structured data on Wikimedia Commons.

The interview protocol and the survey questions are available.

Interview participants[edit]

Participant GLAM institution Institution type Participant role Region Interview video
p1 University of Sao Paolo Museum Wikimedian Latin America via youtube
p2 Aukland Museum Museum Wikimedian in Residence Oceana via youtube
p3 National Archive of Norway Public archive Wikimedian/staff technologist Europe not available
p4 multiple multiple Wikipedian in Residence Latin America via youtube
p5 National Library of the Netherlands Library staff technologist Europe via youtube
p6 Fundació Joan Miró Art museum Wikimedian in Residence Europe via youtube
p7 Netherlands Institute of Sound and Vision Private archive staff technologist Europe not available
p8 Government Medical College, Kozhikode Hospital Wikimedian South Asia via youtube
p9 Netherlands Institute of Sound and Vision Private archive Wikimedian in Residence & staff researcher Europe via youtube
p10 Ghana National Archive Public archive Wikipedian in Residence Africa via youtube
p11 Bulgarian Archives State Agency Public archive Wikimedian Europe via youtube


  • July-August 2017: develop interview protocol; conduct first set of interviews; present initial findings at Structured Data offsite (Montreal, August 15-16)
  • August-September 2017: conduct second set of interviews, develop survey
  • October 2017: release survey
  • November 2017: analyze data, develop personas, report results

Policy, Ethics and Human Subjects Research[edit]

Interviews were conducted, data stored and shared, in accordance with Wikimedia's guidelines for research consent, data access, and data retention. Survey data was collected and stored in accordance with the terms of a survey privacy statement developed with the Wikimedia Foundation Legal Team.


Analysis of the interview data led to the identification of several themes. Each theme represents a set of challenges and opportunities related to the way GLAM projects currently interact with Wikimedia Commons.

Preserving important metadata about media items[edit]

Functionality and usability of batch upload tools[edit]

Monitoring activity and tracking impact after upload[edit]

Preparing media items for upload[edit]

Working with Wikimedia and Wikimedians[edit]

See also[edit]


Pages with the prefix 'Supporting Commons contribution by GLAM institutions' in the 'Research' and 'Research talk' namespaces: