Jump to content

Wikimedia Commons Data Roundtripping/Research

From Meta, a Wikimedia project coordination wiki

This page describes the research results of the 'Wikimedia Commons Data Roundtripping' project. The full report can be found on here.

Research Report – Returning commons community metadata additions and corrections to source

The research was done by Maarten Zeinstra (IP Squared) with help of the project team of the project and made available under a Creative Commons Attribution 4.0 license.

Research question

[edit]

"What are the needs and expectations of GLAMs to adopt user contributed information from Wikimedia projects into their collection registration systems?"

Methodology

[edit]

A questionnaire was developed and communicated to GLAMs across the world. The outcomes of the questionnaire are used to provide a quantitative perspective of the needs and expectations of GLAMs to adopt user contributed information from Wikimedia projects into their collection registration systems.

The survey did not limit itself to user contributed information on Wikimedia Commons. The quantitative survey makes an additional distinction between other types of data contributions to give an overview of the needs and expectations of the GLAMs. The survey splits user contributions into three categories:

  • Direct user contributions (e.g emails and phone calls),
  • Sector collaborations (e.g. authority files, and thesauri), and
  • General third party contributions (e.g. crowdsourcing and Wikimedia Commons).

This approach provides research data to describe indications what might block general adoption of third party information. These indications helps to determine if the found barriers are distinct to Wikimedia Commons or are general sectoral issues of the GLAMs. The survey also indexes technical capabilities of GLAMs to adopt metadata from third parties. It gathers data on the technological readiness for adopting third party contributions to the GLAMs collection management systems. These include bulk import of metadata and methods for disambiguation of data.

In a second phase an interview script was developed based on the outcomes of the questionnaire to support local interviews with the aim to get at the challenges and opportunities of selected institutions for further collaboration. The focus of these interviews were to provide indications of the current practices of maintaining collection metadata and verify the outcomes of the quantitative research.

Research results

[edit]

The survey results makes it clear that there is an interest in extracting enriched metadata from Wikimedia Commons. It is also clear that most organisations will struggle to ingest this data. Automatic processes to ingest data and sector collaboration using authority data are not common practice among the institutions that responded to the survey. This is evident from the barriers that are indicated by the respondents. The barriers to ingest direct user contributions are mostly based on the actual person contributing information and the type of information contributed. Here GLAMs mostly cite a lack of trust and verifiability of the source. The barrier of institutions to adopt new data from a person, which might not be an expert, is higher or lower depending on the content of the contributed metadata. A simple typographical error is easier accepted than adding substantive information about records.

When looking at data from other institutions and from authority files there is usually little question about the quality of the information, instead technical resources are mentioned as a barrier to adopt this type of metadata. Instead an authority file is usually adopted by linking, instead of duplicating metadata.

When asked about third party contributions our respondents cite a general lack of resources as the highest barrier to adopt third party metadata. This includes technical resources as well as human resources. Additionally some indicate that the trustworthiness of this information also comes into play.

The barriers to adopt third party information are stacked. With direct third party contributions there are very little constraint in (technical) resources, constraints in human resources (e.g. time and expertise) might still occur. However the central of all contributed information is verifiability and trust of the source. As long as that barrier is not lowered data adoption by GLAMs will proof difficult.

This lack of trust and verifiability is less problematic for sector collaborations (e.g. thesauri and authority files), as these are developed by other trusted parties in the heritage field, usually other GLAMs.

This is supported by interviews with stakeholders. During these interviews the barrier of trust is highlighted. Interviewees indicated that they can spend up to one hour per change to validate the source and suggestion before changing their records, but likewise would always link to a trusted authority file despite not having different metadata for one resource.

Interviewees indicated that Wikimedia Commons have a large added value when Wikimedia Commons contributors:

  • Add translations of existing metadata
  • Add descriptions about the subject matter of contributed content
  • Link to other sources that verify metadata of a media file.

Additionally institutions indicated that if Wikimedia Commons would become more similar to an authority file in use and operation then it is more likely that they will adopt this structured information.

Recommendations

[edit]

It is recommended that this project tries to lower the constraints in technical resources and other technical issues by developing a tool that works lowers the identified barriers for adoption.

The recommendations

  1. Lower technical barriers for adoption by creating simple export functionality
  2. Focus on altered metadata, contextual metadata translations, and authority references
  3. Generate trust by showing user information
  4. Present structured data on Wikimedia Commons as an authority file
  5. Integrate unique identifiers
  6. Integrate other authority files

Lower technical barriers for adoption by creating simple export functionality

[edit]

It is not necessary to adopt API standards as most respondents and interviewees indicated that they do not use APIs for ingesting data from other parties, except for authority files as linked data. Practically this means that the minimum viable product of this project should not include complicated data export functionality. Being able to download information as a Comma Separated Values (CSV) would be sufficient.

Focus on altered metadata, contextual metadata translations, and authority references

[edit]

The survey and interviews have shown that altered metadata, contextual metadata, translation and references to authority files are most valued by the GLAMs. Contextual metadata included structured data of objects, persons or entities that are depicted by the contributed media files.

Generate trust by showing user information

[edit]

A barrier for adopting information from a direct contributor relies on a level of trust of that contributor. This also applies to data added by Wikimedians. Showing that the edits were made by people who generally do not make edits that are reversed helps build trust in the added data.

Present structured data on Wikimedia Commons as an authority file

[edit]

This project has an opportunity to promote the structured data of Wikimedia Commons as an authority file itself. Therefore moving the perceived barriers from ‘third party collaborations’ to ‘sector collaborations’.

The researched showed that sector collaborations do not suffer from high barriers of lack of trust, thus aligning Wikimedia Commons with these authority files lowers that barrier for adoption.

A secondary recommendation related to this is to highlight the linked data functionality of structured data of Wikimedia Commons. GLAMs should be able to link to contributed media on Wikimedia Commons using a URI. This allows further adoption of Wikimedia Commons as an authority file for media files.

Integrate unique identifiers

[edit]

A large percentage of respondents indicated that they have public unique identifiers for objects in their collections. A good step to promote the new capabilities of structured data on Wikimedia Commons is to add these identifiers to contributed media on Wikimedia Commons.

Integrate other authority files

[edit]

It is also recommended for Wikimedians to work on integrating other structured data like thessauri and authority files of other GLAMs and heritage institutions. It is believed that this will increase trust in the structured data.

More information

[edit]

Download the full report here.