FindingGLAMs/White Paper/SANG

From Meta, a Wikimedia project coordination wiki
Expanding what is possible around GLAMs on the Wikimedia projects
A White Paper as Guidance for Future Work
developed as part of the FindingGLAMs project

Case Study 2: SANG – Sharing Audio files and Note sheets Globally[edit]

Key facts[edit]

Time: April – May 2019

Organizations involved: Swedish Performance Arts Agency

Wikimedia/free knowledge communities involved: Wikimedia Sverige

Keywords: music, Wikimedia Commons, sheet music, Optical Music Recognition

Key conclusions[edit]

  • We can to a certain degree meet the needs of GLAM institutions who want to share multimodal material on Wikimedia Commons.
  • Audio files are underrepresented on Wikimedia Commons in terms of number and infrastructure, making it hard to find good examples to follow. More good examples are essential to encourage editor activity.
  • There is potential to make working with sheet music on Wikimedia Commons more rewarding, but it needs both documentation and good examples.
  • There is a treasure trove of unique heritage in GLAMs in the form of audio files that should be included onto the Wikimedia platforms to enrich the articles on Wikipedia.
  • By connecting different types of material from GLAM institutions through structured data added value for both the partner organizations and the end users can be created.


The Swedish Performing Arts Agency (Musikverket) is a government agency promoting music and related art genres (theatre, opera, dance, etc.) in Sweden. They have a specialized library and archive documenting the history of music in Sweden.

We had worked with Musikverket previously, uploading digitized archival photographs to Wikimedia Commons and organizing Wikipedia writing workshops for the staff. Because of that, we already had established contact with staff members who were knowledgeable about the Wikimedia projects and the practicalities of preparing material – most importantly for this project, the copyright requirements. We reached out to Musikverket with information about FindingGLAMs, indicating we would like to work with material that is more complex than the photographs we had worked with previously, especially something involving several different types of media.

The provided material consisted of audio files (MP3), images (TIFF) and digitized sheet music (PDF). The theme was Swedish music history and the selection of material was done entirely by Musikverket staff. They selected resources that highlighted the variety of material the organization worked with, with an emphasis on unique collections, such as the joik collection.


GLAM institutions often have different types of media (audio, visual, text) related to the same subject. In a physical setting, such as a museum exhibition, ties between the different media types can be created in order to place the topic in a wider context, to deepen the viewer's understanding and to increase the emotional impact of the material.

On Wikimedia Commons, visual material, such as photographs and artworks, is the main focus. It is reasonable to claim that the majority of the users – both directly on Commons, but also on the other Wikimedia platforms, such as Wikipedia – also focus on the static visual content, and are not as familiar with other media formats as audio or video. It is a challenge to examine how other types of media can be organized on Wikimedia Commons in order to offer users a more interesting experience while at the same time creating a more fair representation of GLAM collections. Indirectly, this may lead to the user becoming more interested in what the particular GLAM has to offer.

That’s why in this project, we focused on examining how multimodal material from a GLAM collection related to the same topic can best be organized on Wikimedia Commons. Our central idea was to both maximize the benefit for the user and create a richer, more nuanced representation of the collection.


The upload consisted of three different media formats: audio, images and sheet music.

In the audio upload, a major problem that we encountered was that due to the comparatively low amount of audio files on Wikimedia Commons, the existing infrastructure felt severely underdeveloped in comparison with the infrastructure for images. For example, there is only one information template for audio files[1], while there are many templates derived from Artwork or Photograph templates. This means that Commons editors have not felt the need to create more specific templates for audio, which might reflect a low level of interest in this type of media. Consequently, editors who want to edit and create audio-related informational templates do not have many examples or documentation to learn from. In order to present information about the audio files, we created the template Musikverket-audio[2] which can be used with files provided by the GLAM institution.

The reason why so many information templates exist is to make it easier both for the editors and the readers. A template designed specifically for a certain collection or media genre provides fields that are especially useful for this subset of files, as well as display some information without the editor having to input it themselves. For instance, the BASA-image template[3], designed for use with files shared by the Bulgarian Archives State Agency, automatically adds the institution's informational banner, as well as prompts the editor to input some accession details specific to this institution. The editor does not risk forgetting to include this information, neither do they have to worry about making it look good to the reader; the template takes care of the formatting, ensuring a consistent look across the collection.[4] Thus, informational templates are a helpful tool that both empowers editors and makes the data more legible.

Another aspect that heavily influenced our work was the quality of the metadata we received from the GLAM. The metadata contained the basic information about each soundtrack, such as the creation date, title and where and by whom it was performed. However, the information was not very well structured. For example, there were no links to authority posts for the involved people, despite the fact that Musikverket maintains its own authority database. We were informed that the audio database is not connected to the authority database, which made it difficult for us to place the files in relevant categories, as those had to be identified manually. In addition the metadata lacked structured information about the type of the music, such as classical music, piano music, etc. Again, this made it impossible to identify relevant categories automatically, as the only way to do that was by reading the title manually and making assumptions; something that requires good knowledge about music and introduces the risk of making errors due to the uploader’s bias. As a Wikimedia organization, we believe our role is to serve as a neutral middleperson between the GLAM and the Wikimedia platforms; if we make guesses on e.g. what music genre a file belongs to that are not based on the metadata (or other information provided by the GLAM), it can make the community believe that the information originated from the GLAM and is thus correct and reliable.

The audio files were uploaded using a Python script developed specially for this collection,[5] which in turn leans heavily on a reusable library[6] which we had created previously. The library contains functions for building description templates for Wikimedia Commons files and uploading the files together with the descriptions. We have used this library multiple times for Commons uploads. The specially created script is custom for this collection, and could only be re-used for a collection with exactly the same metadata format, i.e. coming from the same institution/database. Building tools that are universal and can be used directly with any institution's metadata with as little adjustment as possible is a big challenge that affects content partnerships – both the speed at which data can be processed and materials uploaded, and the easiness and accessibility of the process. The reason why we used custom scripts for this upload is that we had previously worked with metadata from Musikverket’s database and thus could perform the upload by modifying older code. Otherwise we would have used the more flexible tools available today, such as OpenRefine for processing the metadata and Pattypan for uploading the files to Wikimedia Commons.

The image upload consisted of a dozen files which, according to the GLAM staff who selected them, were "related to the audio files in some way". Those were not accompanied by machine-readable metadata at all; all information we had were the filename (e.g. Etikett) and the inventory number of the related audio file. Because of that, it was impossible to categorize the files, as we simply did not know what was shown in them.

The image files were uploaded manually, due to the small number of files as well as the poor metadata accompanying them, which made it inefficient to create a special script to process such a small amount of data.

The advantage of there only being a handful of image files is that it was easy to link them to the corresponding audio files. We did it by adding <gallery> tags in the Related files section of the infobox template, which is a common way of linking files that have strong ties to each other (see for example how it can be used to direct the user’s attention to the other side of a photograph[7]). That was done in both the audio and the image file. That way, when looking at an image file, the user can click on the play button and listen to the music track. And the other way round, on the audio file’s page, the user can easily see the related image files. The scanned sheet music files were processed in the same way, i.e. linked to the music files just like the other image files. However, due to their nature they presented an interesting conceptual challenge, which is why they deserve separate treatment.

Sheet music[8] is a way of representing sounds using special notation, much like text is a way of representing language using the alphabet. Musicians can read the notation and reproduce the music based on it. Since the purpose of the notation is to convey the same information to different people, it must be standardized and objective – again, just like text. There should exist a way for computers to interpret the notation and play music based on it – and that such a solution should exist on Wikimedia Commons, so that users can listen to all the openly licensed sheet music that has been uploaded there.[9] Ideally, it would be implemented in a user-friendly way, such as a button next to the sheet music image. We did not find such a solution so we investigated whether it would be possible to create it. This required exploring the technical aspects of the stages necessary to make it possible.

First of all, the sheet music image would have to be converted to meaningful musical characters, just like a scanned text has to be OCR’ed[10] in order to be read out loud by a computer. We found an open source tool that did exactly that.[11] The tool can produce output in the MusicXML format, which in turn can be converted[12] into LilyPond[13] – a format that to some degree is already used on Wikimedia Commons and Wikidata.

We then studied the material on implementing LilyPond and musical scores on Wikimedia Commons and Wikidata. We found that intensive discussions have been held on the topic of musical notation files on Commons, with arguments for and against different formats.[14] The advantage of LilyPond is that it is supported on MediaWiki through the Score extension; musical code in the LilyPond format can be displayed as a PNG image and played as an audio file, which is close to what we imagined as a good solution.[15] This extension has documentation on Commons,[16] and a template exists to make it easier for the user to enter LilyPond data.[17] At the point of writing, the category Images with LilyPond source code contains around 100 files.[18]

These findings indicate that processing sheet music files would encounter several challenges. First of all, it is unclear how accurate music transcription software is, especially when dealing with less than perfect quality scans (as in our case). To make a comparison with OCR software, OCR’ed text e.g. on Wikisource undergoes proofreading and usually requires significant correction. It is reasonable to assume the same would have to be done with musical notation. The crucial difference is that the pool of potential proofreaders of text is comparatively large, as this task can be done by laypeople; no special knowledge other than a familiarity with the language is required. In order to both detect and correct mistakes in musical notation, one needs to be able to read and interpret the special characters, which is a special skill.

Furthermore, between the discussions on Wikimedia Commons and the small number of files using LilyPond code, it is our impression that there is no well established process on how to implement an idea such as ours. In fact, we did not succeed in locating LilyPond files on Commons that used the Score extension to actually play the sound. A typical file that we examined[19] contained both a visual representation of the music (a picture of musical notes) and a LilyPond source code, but no functionality to hear the sounds.


The project resulted in 145 audio files[20] and 33 image files[21] uploaded to Wikimedia Commons. Due to the previously mentioned difficulties in categorizing the files automatically, we asked the Swedish Wikipedia community for help. We posted information about the upload[22] on the page Månadens uppdrag (the month’s tasks), where Wikipedians can share tasks they would like to have help with. We specifically asked for help with categorizing the newly uploaded materials on Wikimedia Commons.[15]

At the time of posting the request, there were 118 files needing categorization. At the time of writing, this number has decreased to 100. Objectively this is a small decrease; based on our experience, the volunteer involvement has been smaller than it tends to be with more “typical” uploads, such as photographs or artworks. On the other hand, it might be that the material in this particular upload is extremely niche and by its nature only interesting to a small number of editors. This is reflected by the poor usage of the files; none of the audio files are used in Wikipedia articles[23] while two of the sheet music files are used to illustrate relevant articles in Swedish Wikipedia.[24]

Apart from adding the categories, a small number of files have been edited by adding information, such as descriptions.[25] This provided added value, as the information could not have been added based solely on the metadata provided by the GLAM institution.


This project demonstrates that there are significant barriers to fully use the potential of multimodal resources on the Wikimedia platforms. The following observations should be kept in mind both when working with audio material from GLAM institutions and developing new tools for GLAM uploads.

Firstly, and most importantly, a lot could be done on Wikimedia Commons to make it more suitable for audio and other non-visual content, from the point of view of contributors and users alike. After all, Wikimedia Commons has an educational mission: all the content should be “instructional or informative”.[26] As the majority of Commons users are not musicologists, we argue that contextless musical sounds are hardly informative. When one looks at a typical audio file uploaded as part of our project, the screen is dominated by the file’s metadata, and the actual playing interface is hard to locate.[27] The “call to action” – the “play” button – is small and unobtrusive. This makes it obvious that the interface was designed for visual files, which take a central position on the screen.

On the one hand, the priority given to images is not surprising. Wikipedia is a text based encyclopedia, and images can enhance the understanding of pretty much all of the articles, regardless of topic while audio arguably is not as applicable across the spectrum. On the other hand, today Wikimedia Commons is much more than that: a repository of free media, available for anyone to re-use and re-mix. Today’s Wikimedia Commons holds videos, music, 3D models and geographic data files. This variety should be reflected in the direction of its development. What is a good interface for music files? Maybe it should include a possibility for users and uploaders to curate collections of audio files played one after another, like a music tape, that could contextualize songs with a common theme. Or a tool to build slideshows with related images and texts that the user could look at while the music is playing; looking at a static page of metadata for several minutes might feel unsatisfying for anyone apart from the most hardened sound fans.

A large part of this case study was researching the available technologies for encoding and decoding music. Above, we identified software such as Audiveris, LilyPond etc. that has been developed to deal with different aspects of this problem. This indicates that open source developers have been actively working to make dealing with music on the computer easier. At the same time, understanding what possibilities there are and the functions and limitations of the different tools requires both research and specialist knowledge.

An ambitious idea would be implementing all these functionalities directly in Wikimedia Commons. A Commons user could upload a digitized sheet music, for example shared by a museum, and have it automatically converted to sounds using built-in Optical Music Recognition. Any errors generated in the automatic transcription process could be corrected by Wikimedians using a proofreading tool, akin to how Wikisource works as a platform for transcribing and proofreading text. Such a proofreading tool could either be available directly in Wikimedia Commons, or as an expansion of Wikisource. Furthermore, once the sheet music is converted to machine code, it could be connected to the Wikidata items of the music pieces. Applying the power of structured data to music would enable new research applications, empowering Wikimedia users to answer questions such as what types of sounds are most common in the works produced in a certain century, etc.

Furthermore, an uploaded sound file could be converted to musical notes automatically, using the same technology, but the other way round – this would be especially appreciated by people interested in music but affected by hearing loss, making the Wikimedia platforms more accessible. Also music educators could have use for such a solution, being able to generate sheet music from public domain recordings to share with their students.

An interesting aspect of this project for further development opportunities was working with poor metadata. On the one hand, rich and detailed metadata makes it easier for users to find files and benefit from them educationally. On the other hand, poorly described files can be improved by Wikimedians: more specific categories can be added and descriptions of what the files are depicting can be made more detailed and easier to understand. Experts in any field can be found contributing to the Wikimedia platforms. The improvements they make benefit primarily other users of the Wikimedia platforms, but they could also provide added value to the GLAMs who have shared the files in the first place. This is called data roundtripping[28] and we explore it in depth in Case Study 7 on the basis of research and pilot projects done in collaboration with several Swedish cultural heritage institutions.

The potential of using Structured Data on Commons (SDC) with audio files is something that we did not research at the time of the upload, but it might deserve further exploration as well. The work done on SDC so far, as well as the editor engagement, have been focused on image files, for obvious reasons (them being both prevalent on Wikimedia Commons and accessible to the vast majority of editors). As development is progressing and community standards around SDC for images are taking shape, investigating non-visual media is a logical next step. It might even increase the currently low interest in other media types and indirectly cause a surge of activity around them.

Finally, something that this project made very clear to us, is that active participation of domain experts in content partnerships is crucial. While we have a lot of experience with image uploads, we encountered difficulties in our work with the audio files. That was due to both insufficient metadata and our unfamiliarity with how music is described and categorized, and what technical possibilities exist for those who work with it. We believe that our role in content partnerships is to serve as a neutral middleman between the Wikimedia platforms and the GLAM. If we make decisions on e.g. categorizing the files based on our best guess rather than on the metadata, we risk introducing bias due to our lack of domain knowledge. This is something that Wikimedians who upload GLAM content should be aware of; sometimes it cannot be avoided, in which case it is important to know one’s own biases and limitations. The best solution to this problem is to encourage active participation of GLAM staff in the upload process – which requires documentation, training and empowerment to help them gain confidence as Wikimedians, and developing the technosocial infrastructure of the Wikimedia platforms to be more user-friendly, robust and understandable.


  15. a b
  23. Statistics
  24. Statistics
  25. For example, here volunteers added both relevant categories and information: