Research:Technical needs of external re-users of Commons media

From Meta, a Wikimedia project coordination wiki
Tracked in Phabricator:
Task T190228
Duration:  2018-May – 2018-June
This page documents a completed research project.


Wikimedia Commons is one of the largest repositories of freely licenced media on the internet. This makes Commons one of the most potentially valuable sources of free-to-use images and video for individuals and organizations—including academic researchers, artists and designers, journalists, and other free culture projects—who seek to remix, repurpose, or otherwise incorporate visual media into their own creative works. However, it can be difficult to find the media you want on Commons, and it is not always clear how to correctly attribute Commons media that has been re-published on its own or incorporated into a larger creative work.

Project overview[edit]

The Structured Data on Commons program seeks to implement technical changes to Wikimedia Commons that will make it easier to search for relevant media and re-use those media with correct attribution. Other technical improvements that may be within the scope of the program include interfaces for editing media items and tracking re-use outside of Wikimedia websites on an opt-in basis.

The current project will involve soliciting feedback from individuals and organizations that already re-use Commons content outside of the Wikimedia Movement (i.e. outside of Wikimedia projects or other Movement publications). The goal of the project is to understand the current painpoints and unmet needs of these individuals and organizations, in order to inform the design of new software features and functionality for searching, browsing, downloading, and re-publishing Commons media.

Methods[edit]

A list of candidate orgs can be found on the /Candidates subpage.

We'll be soliciting input from external re-users via surveys, email interviews, or semi-structured interviews—depending on the number of orgs we identify and our ability to secure the time and attention of the relevant subject matter experts within those orgs.

Timeline[edit]

  • May 2018: Identify and recruit participants
  • June 2018: Conduct research and write up report

Findings[edit]

Criteria for selecting media to re-use[edit]

  • Re-users reported that they turned to Commons for historical media, and turned to other media repositories (often commercial repositories like Reuters and the Associated Press) for media related to current/recent events. Common types of 'historical' media that participants referred to included paintings, historical photographs, and images of artifacts and famous landmarks. They didn't look for current event-related media on Commons because their did not believe they would find it there, or didn't know where to look for it. In only one case did a participant report using Commons for current media, because they knew of a particular GLAM institution category that contained many recent images related to the content their organization produced.
  • Re-users need high-quality media. 'Quality' means several different things. One important dimension is artistic quality participants agreed that one were looking for media that were engaging and informative--or in the words of one participant "both didactic and beautiful." Composition, framing, lighting were major factors that dictated whether they used an image or not. Color correctness was a factor that several participants highlighted: if Commons contained several versions of a famous painting, they would compare them and choose the one that most accurately represented the color profile of the painting. Another dimension of quality is image size and resolution. Most re-users have a minimum set of dimensions (height and width) that the images they use need to conform to--these ranged from a few hundred pixels to a few thousand.
  • Re-users need media that can be re-published. All participants understood the basic legal restrictions associated with Commons media. The specific licenses they looked for varied, based on their organization's own licensing policies and whether or not they syndicated their content. Several participants independently emphasized that one of the primary reasons they used Commons media was that their own textual content was licensed for re-use and widely syndicated by design, so they needed the images associated with those articles, blogs, videos etc to be transportable--and therefore licensed under the same (or less restrictive) terms.

Search and discovery workflows[edit]

  • Re-users seldom start their search on Commons itself. Most participants reported starting their search on Google image search, filtered by "labelled for re-use". These queries often led the re-user to Commons, and less often to Flickr. The only other free media repositories that the participants reported using included the USA Library of Congress and Unsplash, but these were generally viewed as special-purpose repositories, whereas Commons and to some extent Flickr were used for general purposes. Several participants also reported starting their search on Wikipedia articles (sometimes across multiple languages) related to their need and then either selecting images from among those displayed in the article, looking for a "Wikimedia Commons has related media" template on the article (which was often absent, and inconsistently placed and presented when present), or clicking through the displayed image to the file page, and then using categories to find related images.
  • Once on Commons, re-users are more likely to use categories than the search box, but neither option suits their needs well. Most, but not all, participants were aware of Commons categories, and reported that they found them useful in specific cases: for example, if a re-user has found a Commons image of a church interior via Google, and sees that that image is categorized under Interiors of churches, they may explore that category to browse for other options. Two very common frustration among people who used this workflow were that the category pages did not show thumbnails of the images in the category, and that categories were often too deeply nested. Both of these features of categories made it difficult to browse related images. Participants also reported frustration at finding useful categories again the next time they needed a similar image: one participant kept a browser bookmark for categories that she needed to search often; another described trying--and failing--to re-discover a useful category via the search box. The search box (and the search page) are not widely used. Several participants stated that they never used Commons search under any circumstances. Another reported frustration that when she did use search, she received far too many irrelevant results, and that the process of paging through these irrelevant results to find useful media was far too time-consuming.

Re-using and attributing Commons media[edit]

  • Re-users take attribution seriously. The re-users interviewed for this study are probably much more likely than 'average' Commons media re-users to understand and care about copyright—after all, most of them work for mission-driven organizations that also utilize copyleft licenses for the content they generate in-house. These participants all made an effort to attribute the content they got from Commons—out of respect for content creators, institutional curators, and Commons itself, and for ideological reasons. Several expressed frustration that other people didn't attribute content they re-used from Commons, and suggested that Wikimedia should provide more prescriptive guidance on File pages to make sure that more people respected our attribution policy.
  • Re-users often fail to attribute Commons media correctly. Even among this set of Commons-savvy searchers from Commons-friendly institutions, few regularly adhered to the standards for credit lines by license when attributing commons media. Almost all of them consistently provided some kind of attribution, but how much information they added to the credit line, and what kind, varied widely. Several participants reported that they knew that they weren't crediting Commons media in precisely the correct way, but did not know what that way was, or how to find out how. Others noted that organizational preferences, style guides, or technical requirements dictated the mode of attribution they used: in some cases, they included less information because they felt a full attribution would be too long and therefore distracting; in others their organization's house style dictated that all media be credited in a similar way, regardless of license; in one case, the participant noted their their platform's CMS only allowed them to include certain fields in the credit line.
  • Re-users consider the displayed license when deciding whether to use a media file. Participants do not assume that all Commons media was equally re-usable, or re-usable under the same terms. They assess the license that is listed on the page, and decide whether to re-use (and how to attribute) based on that. Several expressed frustration when the terms of the license were not clear. For example, one participant noted that they hated GNU licensed images because they were unsure of what kind of re-use/attribution that license specified. Another participant noted that in cases where there were multiple versions of the same image, they generally tried to chose one where the license was clear and looked "correct." In this case, correct means that they believe that the attributed author actually has the right to release the image—they didn't automatically trust that it was re-usable just because it was on Commons. One participant described using Google to search for a version of a picture in an authoritative institutional repository (like a GLAM) to confirm that the image was licensed for re-use before they trusted the Commons license.
  • Re-users who programmatically access Commons media struggle with license templates. Two participant who worked for separate organizations that extracted Commons media (and metadata) via the API expressed frustration that the licenses were inconsistently formatted, which made it difficult to parse the template and extract the relevant metadata completely, consistently, and correctly.

Other image repositories used by interviewees[edit]

For free licensed images, interviewees reported using Flickr Commons, Europeana, Google Art, and the US Library of Congress. For copyrighted images, interviewees reported using the Associated Press, Reuters, and Shutterstock. Several interviewees also mentioned commissioning photographs and drawn images from individual artists when they needed something specific; it is not clear what copyright status of the resulting media was.

Recommendations[edit]

See also[edit]

References[edit]