Research:Supporting Commons contribution by GLAM institutions/Functionality and usability of batch upload tools

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Preserving important metadata Functionality and usability Tracking impact Preparing media items Working with Wikimedians

"Bad and missing instructions meant it took 2 days of work for the GLAM to work out how to do it."

All current upload tools will need to be updated to associate structured metadata with uploaded media. This provides an opportunity to improve the usability and feature offerings of existing upload tools, and to define design requirements for new upload tools.

Bar chart of upload tools used by survey respondents
Upload tools used by survey respondents

Interview and survey participants used a variety of upload tools to upload media content to Wikimedia Commons. The most popular tools were PattyPan, GLAMwiki Toolkit, and the Upload Wizard. Participants reported a variety of other tools as well, such as Commonist, ComeOn!, Vicuña, GLAMpipe, and Flickr2Commons. Some participants with more technical expertise, or who worked on GLAM projects with programmers, developed programmatic upload workflows using both custom scripts or established frameworks such as PyWikiBot. These workflows are captured under "Other tools" in the chart to the right.

Limitations of current upload tools[edit]

Participant Tool What was the most frustrating thing about the tool you used?
R_1g5XR3OskQceDHu GLAMwiki Toolset "The metadata had to be prepared very precisely in order to map properly. This was time consuming and limited the flexibility of the tool."
R_vHOyezGy1iLq2xH Upload Wizard "Max allowed number of files a batch."
R_2DOFxMVl3pLOOv1 GLAMwiki Toolset "The process of setting it up (beta-version, etc.) is tedious."
R_31Y03TqvZTgtOaY Upload Wizard "Not being able to change the file name after something was uploaded."
R_2Cfx3PSRNZLpX8i GLAMwiki Toolset "The initial learning curve was rather steep. The process to get permission/accreditation to use the tool and the beta environments of Commons and WMF- labs is rather cumbersome. You can only preview the first 3 items. You can't cancel the uploading process once it has started"
R_sO67qlEBjWLViN3 GLAMwiki Toolset "Correcting errors with the media files which hadn't been flagged up in the preview/test viewer. I recall there was an issue with how the images displayed (it was something like the size of them made the viewer not work, or maybe it was that they were TIFFs I can't recall). The other frustrating thing was that when there was an error in the batch I would get like a million notifications that my images were going to be removed, so for something like 500+ images that was a little intense! In this case it was a rights thing that I'd gotten wrong and had caught straight away and was in the midst of fixing. "
R_24GVMFa87idKwHP Upload Wizard "You can only use the default template, not artwork, or books, etc."
R_6s4jWKOBeeKv5xn GLAMwiki Toolset "Waiting to be whitelisted"
R_yF6z0rpbqBtTKh3 Upload Wizard "Sometimes it's difficult to vary descriptions."
R_bQUpK6LNMMSUhot PattyPan "Mapping data to categories and merging our spreadsheets to work with existing templates."
R_SOxTwuflXqfrSql PattyPan "poorly integrated to Wikidata -- i.e., quick statements -- which in this case will lead to double work which could have done at once"
R_1g5XR3OskQceDHu GLAMwiki Toolset "The metadata had to be prepared very precisely in order to map properly. This was time consuming and limited the flexibility of the tool"
R_3Gv8PUHcHHzH3ki PattyPan "When there are connection issues and a file is not uploaded you have to reload the csv and restart the procedure. Something to improve is the documentation."

Participants noted a variety of important limitations to the upload tools they used. While some issues were tool-specific, there were some common themes.

  • Previewing and error handling. Participants noted that it was often difficult to verify the correctness of the output of particular upload tools; previewing functionality was limited or non-existent. They also noted that error and confirmation messages provided by the tools were sometimes ambiguous, confusing, repetitive, or entirely absent; and that it was not always possible to halt a batch upload in progress even after an error had been discovered. Several participants expressed specific frustration that they were not able to use the upload tool to correct errors discovered after uploading. For example, if you made the same spelling or formatting error in each item in a batch, you need to manually correct the error on a per-item basis; there was no "batch update" functionality available.
  • Preparing for upload. The setup process can be complex, especially for large batches with complex metadata. Tools generally required metadata in a specific format, which often differed significantly from the way the metadata was represented in the institution's repository, requiring significant (manual or programmatic) data-munging. Beyond limitations particular to upload tools themselves, challenges related to mapping existing metadata to the current metadata structures on Commons (templates and categories) is discussed under "Preparing Media Items for Upload" and "Preserving Important Metadata about Media Items".
  • Size and time constraints. A number of participants reported frustration with tools that could not accommodate large batches, or large files. These limitations were sometimes not clearly explicated in the tool documentation, resulting in wasted effort and errors. One interview participant expressed frustration that the videos they uploaded were automatically down-sampled during upload, to such an extent that they felt the resulting videos were not high-quality enough to be useful. Others reported that uploading large batches took considerable time (hours, or according to one participant, days). Another time constraint mentioned by several participants was related to the learning curve required for more complex, or less well-documented, tools.

Valuable features of current upload tools[edit]

Participant Tool What was the most useful thing about the tool you used?
R_eLpYKWSXdeawV7r PattyPan "Tool I know best, and it allows manipulation of metadata in excel instead of having to use more complex programs. Allows batch-uploading. Allows manipulation of metadata templates (i.e. tweak Photography template etc.)"
R_1IoSpLpNexGrn89 PattyPan "It is the only tool to combine a spreadsheet with files and the tool works very well. It works with spreadsheets and modifications are possible per file"
R_2DOFxMVl3pLOOv1 GLAMwiki Toolset "The overall number of files to be uploaded by the same institution is rather large (150'000+).

There was know-how within the local Wikimedia community how to use the GLAM-Wiki Toolset."

R_BxBP7APu2p1wKhb Upload Wizard "Easy to use and supports batch fill information."
R_2t9watKeP17jk6A PattyPan "Easy to use "out-of-the-box", no special processes on Commons needed."
R_2Cfx3PSRNZLpX8i GLAMwiki Toolset "Once you get to know it, it is a powerful tool that can upload 1000s of images without too many problems and good stability. It takes XML as the ingestion format, which meets the most commonly available data format with the GLAM-institution I'm working for."
R_sO67qlEBjWLViN3 GLAMwiki Toolset "It's the only one I was aware of at the time! I liked that I could do the thing in one big batch as long as I had my flat xml file."
R_2TT17URVj5l0EOD GLAMwiki Toolset "it was recommended by the community volunteer who helped our institution to set up the upload process. Flexible metadata mapping; feature to save specific mappings for further use."
R_WibxEOzi7tLl1st PattyPan "Easy to use without any special processes on Commons. Working with spreadsheets that are easy to edit."
R_Q0a5JFhF5nAx4Wt Vicuna "easy to use and possibility to import and fix metadata off line. Geotagging tool."
R_6s4jWKOBeeKv5xn GLAMwiki toolset "I was adding large numbers of files at a time, and it was manageable to provide the metadata as one XML file. Having a test server to do a dummy run."
R_PY9P5XRK95HW8rD Upload Wizard "Provided at starting screen. Easy instructions for understanding process and licensing."

Participants also noted when tools performed well. PattyPan was called out frequently for it's ease of use. Participants particularly liked that structuring content and metadata for upload could be done in spreadsheets, a format with which many less-technical participants seemed more comfortable. GLAMwiki Toolset was praised for its power and flexibility.

  • Familiar data formats and tools for data preparation and input. Participants appreciated that PattyPan used .CSV formatted files as input, allowing users to prepare their files and metadata using familiar spreadsheet programs such as Microsoft Excel and LibreOffice.
  • Good quality end user documentation. Participants appreciated the step-by-step documentation that Upload Wizard provided, particularly the way the documentation described basic copyright and licensing considerations.
  • Flexible and powerful metadata mapping. Some participants appreciated the ability to specify templates and categories for their media items before upload (reducing the amount of manual curation necessary after upload). This feature was noted primarily by participants with a deeper understanding of how Commons categories and templates worked, and/or participants working on GLAM projects where there was relatively extensive metadata available about each media item.
  • Batch metadata processing. Participants appreciated the ability to perform batch operations on the same piece of metadata across all media items in a collection. Especially in upload batches of hundreds or thousands of items, this feature (which is present to varying degrees across almost all batch upload tools) was seen as a vital time-saver.