Research:Supporting Commons contribution by GLAM institutions

From Meta, a Wikimedia project coordination wiki
Tracked in Phabricator:
Task T159495
This page documents a completed research project.

Wikimedia Commons is the world’s largest free-licensed media repository, with over 40 million image, audio, and video files. Commons media are contributed by anyone and curated by volunteers. But the MediaWiki software platform that Commons is built on was designed to host text, not rich media. This creates challenges for everyone who uses Wikimedia Commons: media contributors, curators, and those who use media hosted on Commons—on Wikimedia projects and beyond.

The Wikimedia Foundation has launched a multi-year program to build a structured data layer into Commons, based on the Wikibase platform, that will allow file metadata to be captured in a standard, machine-readable format and make it easier to associate Commons media with Wikidata items.

This page describes one of the research components of that program, which seeks to understand the need for and potential impact of structured data integration on one set of important Commons stakeholders: GLAM institutions (Galleries, Libraries, Archives, Museums).

Key findings[edit]

Slides from a presentation of findings in January 2018 (video)
GLAM contributor personas based on the interviews and survey.


GLAM projects are interested in impact
Donating media to Commons is a means to an end. GLAM organizations and the volunteers who work with them want to know the media they upload is being used, and to be able to evaluate the impact of their donations against institutional goals.
Metadata types are (at least) as diverse as media types
A few basic types of metadata are available for most collections (e.g. GLAM institution, license deed) and media items (e.g creator, date of creation). Beyond that, there is a huge diversity of different kinds of metadata available for different kinds of media—and much of this metadata is considered to be vitally important. One size does not fit all.
Different degrees of technical and Wikimedia literacy
The project members involved in uploading and curating GLAM media on Commons vary widely in their level of familiarity with the Wikimedia Movement (its rules, norms, and community resources), and in their level of proficiency with technical tools and techniques for metadata preparation, batch upload, post-upload curation, and content monitoring.


Tracking the impact of a donation
Although a variety of tools exist for keep track of what happens to media after upload, these tools do not always capture the right metrics, at sufficient granularity, to demonstrate the impact of a donation or monitor the current status of an uploaded collection. These tools also suffer from a lack of visibility, and to some extent a lack of integration into the Commons platform.
Capturing metadata in (semi)structured ways
Most GLAM projects use categories and templates to some extent to capture item- and collection-level metadata. However, their efforts are hampered by the complexity of the category system, and its lack of standardization. In contrast, existing metadata templates like Book and Institution are sometimes too standardized to capture the rich, diverse metadata associated with different media. Participants resort to clever work-arounds to fit their important metadata into these existing structures as best they could.
Demonstrating and preserving media provenance
Many GLAM participants expressed confusion and frustration with the process of demonstrating that the GLAM institution could legally donate the uploaded media to OTRS. Once a donation was approved and uploaded, some GLAM participants expressed a desire for more flexibility in the mechanisms available for linking the collections on Commons back to the GLAM's website or institutional repository.


The following list of recommendations is not intended to be exhaustive. It is focused on addressing the kinds of issues over which the Wikimedia structured data program staff have some degree of direct control.

Renovate Upload Wizard
There are many well-reviewed upload tools in active use by GLAM organizations. However, Upload Wizard is considered to be relatively easy to use and remains one of the most popular, and certainly the most visible, upload tool. It's primary limitations are a lack of functionality for capturing rich item and collection-level metadata, and relatively poor support for media types other than raster images. In renovating Upload Wizard, Wikimedia should consider both the features of current community-maintained upload tools that GLAM projects find especially useful, as well as areas where those tools fall short, in order to improve the default upload experience.
Develop a streamlined process for creating new metadata properties in MediaInfo pages
There are so many different types of useful metadata that it probably isn't possible to anticipate all of them ahead of time. GLAM projects (and anyone else who is a high-volume contributor of metadata-rich media) are the best sources of information about what kinds of metadata are most important, and how those metadata should be stored. Make it easy for them to create (or at least propose) new properties, and map metadata to existing properties, at the point of upload.
Create and maintain a batch upload help portal
Currently, help resources for the various steps in the batch upload workflow is scattered across namespaces and wikis, out of date, incomplete, or non-existent. Every uploader's workflow is different, but a single well designed and well-maintained entry point that links to relevant information about process steps and common pitfalls will benefit everyone.

Background and methods[edit]

GLAM (“Galleries, Libraries, Archives, and Museums”) is an established Wikimedia program that seeks to elicit contributions of free-licensed content from public and private institutions from around the world that curate cultural knowledge resources. One particularly valuable form of contribution that GLAM institutions make to the Wikimedia Movement is by uploading media from their collections to Wikimedia Commons.[1] Media include historical photography, video, and audio recordings, as well as images of artifacts, artwork, architecture, and other cultural goods.

GLAM institutions have communicated that the current tools available for uploading media (often in batches of dozens or hundreds of files) are not well suited for their needs. These media often have associated metadata—e.g. dates of production, authorship, institutional source, license information, and information about the people, places, and things depicted—that may be left out or mangled during the upload process.

Monitoring is also a challenge. GLAM institutions often have a desire or an institutional mandate to track the usage of the files they donate. The lack of structure within the Commons repository and the lack of tools for tracking the usage of GLAM content within Commons make this difficult.

Why focus on GLAM upload projects?[edit]

By understanding the unmet needs and current workflows of GLAM media contributors, we can get a better sense of the challenges that others who upload media to Commons may face.

Obviously, many people who are not affiliated with GLAM institutions or working on GLAM content donation projects also upload valuable media content to Wikimedia Commons. However, GLAM projects are very diverse in terms of their goals, the amount of media content they upload, the types of media content and metadata they have, their level of familiarity with Wikimedia projects generally and Commons in particular, and the kinds of support resources they have available (e.g. staff time, volunteer time, technical resources). This suggests that many of the issues faced by GLAM projects will reflect common issues faced by other, non-GLAM-affiliated contributors.

Additionally, GLAM projects often involve uploading large batches of items—collections including thousands of items, uploaded dozens or hundreds at a time—which adds additional complexity to the uploading process. Batch uploading is poorly supported by default upload tools, and a variety of third-party tools have been developed to support these workflows. Understanding how these 'power uploaders' select which tool to use, and the issues they face when performing their work, can inform the development of new tools, improvements to existing tools and documentation, and other types of technical and non-technical support for high-volume Commons media contributors.

This project research uses semi-structured interviews and surveys of members of GLAM institutions and Wikimedia community members involved in the GLAM program to understand goals, motivations, and current workflows related to contribution and curation of GLAM content. Interview and survey data will inform the development of secondary research artifacts (personas, scenarios, and user requirements) that will inform the decision-making of the Structured Data on Wikimedia Commons program, as well as the design of tools and features to support the contribution and curation of structured data on Wikimedia Commons.


For this research, we interviewed 11 people who had been involved in one or more GLAM content donation projects over the past few years. Participants were recruited through purposeful sampling, with the aim of selecting a set of interviewees who represented diversity in terms of their geographic location, the size of the GLAM(s) they worked with, the type of GLAM, and the type of media being donated.

Participants played a variety of roles in the projects they worked on, often multiple roles simultaneously. All interview participants were directly involved in one or more of the following activities: preparing media items for upload, uploading media, curating uploaded media items, and tracking impact.

The interview protocol is available.

Interview participant information (click to expand)

Participant GLAM institution Institution type Participant role Region Interview video
p1 University of Sao Paolo Museum Wikimedian Latin America via youtube
p2 Aukland Museum Museum Wikimedian in Residence Oceana via youtube
p3 National Archive of Norway Public archive Wikimedian/staff technologist Europe not available
p4 multiple multiple Wikipedian in Residence Latin America via youtube
p5 National Library of the Netherlands Library staff technologist Europe via youtube
p6 Fundació Joan Miró Art museum Wikimedian in Residence Europe via youtube
p7 Netherlands Institute of Sound and Vision Private archive staff technologist Europe not available
p8 Government Medical College, Kozhikode Hospital Wikimedian South Asia via youtube
p9 Netherlands Institute of Sound and Vision Private archive Wikimedian in Residence & staff researcher Europe via youtube
p10 Ghana National Archive Public archive Wikipedian in Residence Africa via youtube
p11 Bulgarian Archives State Agency Public archive Wikimedian Europe via youtube


Based on recurring observations that emerged from the interviews—challenges, roles, goals, types of media and metadata involved—we developed a survey which was released to through the GLAM Monthly Newsletter in September 2017.[2] The goal of the survey was to validate the observations drawn from the interviews with a more representative sample of GLAMs, and to collect high-level descriptive statistics about GLAM donation projects.

The survey questions are available.

Project timeline[edit]

  • July-August 2017: develop interview protocol; conduct first set of interviews; present initial findings at Structured Data offsite (Montreal, August 15-16)
  • August-September 2017: conduct second set of interviews, develop survey
  • October 2017: release survey
  • November-December 2017: analyze data, report results

Policy, Ethics and Human Subjects Research[edit]

Interviews were conducted, data stored and shared, in accordance with Wikimedia's guidelines for research consent, data access, and data retention. Survey data was collected and stored in accordance with the terms of a survey privacy statement developed with the Wikimedia Foundation Legal Team.

Research themes[edit]

Analysis of the interview and survey data led to the identification of several themes. Each theme represents a set of challenges and opportunities related to the way GLAM projects currently interact with Wikimedia Commons.

Preserving important metadata about media items[edit]

"The categorization system on Wikimedia Commons requires insider knowledge."

Certain kinds of metadata are considered to be high priority to GLAM contributors. They will go through extra effort, including creating novel hacks and workarounds to include this metadata. Understanding why certain types of metadata are considered high priority, and how GLAM contributors currently add this metadata will help identify item properties that are high priority for structuring.


Functionality and usability of upload tools[edit]

"Bad and missing instructions meant it took 2 days of work for the GLAM to work out how to do it."

All current upload tools will need to be updated to associate structured metadata with uploaded media. This provides an opportunity to improve the usability and feature offerings of existing upload tools, and to define design requirements for new upload tools.


Monitoring activity and tracking impact after upload[edit]

"There are only a few tools to measure and monitor it but still on a very basic level."

Pretty much all GLAM projects have increasing the visibility and use of their media content as a primary goal. Uploading their content to Wikimedia Commons is a step towards that goal, not an end in itself. In order to understand the impact of a content donation, you must be able to monitor and measure what happens to the content after it is uploaded.


Preparing media items for upload[edit]

"The hardest part doesn't have anything to do with Pattypan itself. It has to do with... Cleaning up metadata and transforming them into the Commons format"

Metadata about files to upload is structured in all sorts of ways (when it is in structured format at all). Usually, some kind of transformation/munging/filtering is necessary to get it into the format that can be uploaded by a particular upload tool.


Working with Wikimedia and Wikimedians[edit]

"The images were removed by some volunteers at Wikimedia Commons."

The success of GLAM projects often depends on the support of Wikimedia Movement volunteers—and to a lesser extent, the Wikimedia Foundation. GLAM contributors, even Wikimedians in Residence, cannot be expected to begin their first project with a complete and correct understanding every technology and policy consideration that bears on their project. They rely on support from movement contributors to share information about tools and rules (especially copyright policy) and—perhaps most importantly—to curate their contributions and integrate them into Wikipedia and other Wikimedia projects.


Survey results[edit]

Between October 4 and 30 2017, 67 people started the survey and 44 respondents completed it. Quotes and observations gleaned from the free-response questions in the survey are incorporated into the thematic analysis above. Charts showing responses to defined-response questions are presented below.


  • 40% of respondents worked on GLAM projects led by libraries. Museums were the next most common type of GLAM, at 28%.
  • 40% of respondents were paid staff members of GLAM institutions. 28% were Wikimedia Movement volunteers, and 22% were Wikimedians in Residence (which may be a paid or unpaid position).
  • 77% of respondents were directly involved in uploading media to Commons. 60% were involved in tracking the re-use of uploaded media. 55% were involved in collecting, formatting, or standardizing media metadata before upload.
  • 58% of projects focused on historical photographs and scans of historical documents.
  • 61% of projects focused on uploading raster-formatted images (e.g. .jpeg, .png). Structured document files (PDF and DjVu) were the second most common response, at 18%.
  • 32% of projects involved uploading 1000-5000 files. Tied for second place are smaller uploads of 100-500 items and over 10,000 items (18%).
  • Unsurprisingly, the vast majority of projects (86%) involved collections for which at least some metadata was available for each item.
  • The most popular upload tool used (18%) was "Other tool" (see Functionality and Usability of Upload Tools section for details), followed by PattyPan and the Upload Wizard (12%) and GLAMwiki Toolset (8%).
  • General ease of use was the most important feature when selecting an upload tool (45-50%). See Functionality and Usability of Upload Tools section for details.


See also[edit]