Research:Supporting Commons contribution by GLAM institutions/Functionality and usability of batch upload tools
"Bad and missing instructions meant it took 2 days of work for the GLAM to work out how to do it."
All current upload tools will need to be updated to associate structured metadata with uploaded media. This provides an opportunity to improve the usability and feature offerings of existing upload tools, and to define design requirements for new upload tools.
Interview and survey participants used a variety of upload tools to upload media content to Wikimedia Commons. The most popular tools were PattyPan, GLAMwiki Toolkit, and the Upload Wizard. Participants reported a variety of other tools as well, such as Commonist, ComeOn!, Vicuña, GLAMpipe, and Flickr2Commons. Some participants with more technical expertise, or who worked on GLAM projects with programmers, developed programmatic upload workflows using both custom scripts or established frameworks such as PyWikiBot. These workflows are captured under "Other tools" in the chart to the right.
Limitations of current upload tools
|Participant||Tool||What was the most frustrating thing about the tool you used?|
|R_1g5XR3OskQceDHu||GLAMwiki Toolset||"The metadata had to be prepared very precisely in order to map properly. This was time consuming and limited the flexibility of the tool."|
|R_vHOyezGy1iLq2xH||Upload Wizard||"Max allowed number of files a batch."|
|R_2DOFxMVl3pLOOv1||GLAMwiki Toolset||"The process of setting it up (beta-version, etc.) is tedious."|
|R_31Y03TqvZTgtOaY||Upload Wizard||"Not being able to change the file name after something was uploaded."|
|R_2Cfx3PSRNZLpX8i||GLAMwiki Toolset||"The initial learning curve was rather steep. The process to get permission/accreditation to use the tool and the beta environments of Commons and WMF- labs is rather cumbersome. You can only preview the first 3 items. You can't cancel the uploading process once it has started"|
|R_sO67qlEBjWLViN3||GLAMwiki Toolset||"Correcting errors with the media files which hadn't been flagged up in the preview/test viewer. I recall there was an issue with how the images displayed (it was something like the size of them made the viewer not work, or maybe it was that they were TIFFs I can't recall). The other frustrating thing was that when there was an error in the batch I would get like a million notifications that my images were going to be removed, so for something like 500+ images that was a little intense! In this case it was a rights thing that I'd gotten wrong and had caught straight away and was in the midst of fixing. "|
|R_24GVMFa87idKwHP||Upload Wizard||"You can only use the default template, not artwork, or books, etc."|
|R_6s4jWKOBeeKv5xn||GLAMwiki Toolset||"Waiting to be whitelisted"|
|R_yF6z0rpbqBtTKh3||Upload Wizard||"Sometimes it's difficult to vary descriptions."|
|R_bQUpK6LNMMSUhot||PattyPan||"Mapping data to categories and merging our spreadsheets to work with existing templates."|
|R_SOxTwuflXqfrSql||PattyPan||"poorly integrated to Wikidata -- i.e., quick statements -- which in this case will lead to double work which could have done at once"|
|R_1g5XR3OskQceDHu||GLAMwiki Toolset||"The metadata had to be prepared very precisely in order to map properly. This was time consuming and limited the flexibility of the tool"|
|R_3Gv8PUHcHHzH3ki||PattyPan||"When there are connection issues and a file is not uploaded you have to reload the csv and restart the procedure. Something to improve is the documentation."|
Participants noted a variety of important limitations to the upload tools they used. While some issues were tool-specific, there were some common themes.
- Previewing and error handling. Participants noted that it was often difficult to verify the correctness of the output of particular upload tools; previewing functionality was limited or non-existent. They also noted that error and confirmation messages provided by the tools were sometimes ambiguous, confusing, repetitive, or entirely absent; and that it was not always possible to halt a batch upload in progress even after an error had been discovered. Several participants expressed specific frustration that they were not able to use the upload tool to correct errors discovered after uploading. For example, if you made the same spelling or formatting error in each item in a batch, you need to manually correct the error on a per-item basis; there was no "batch update" functionality available.
- Preparing for upload. The setup process can be complex, especially for large batches with complex metadata. Tools generally required metadata in a specific format, which often differed significantly from the way the metadata was represented in the institution's repository, requiring significant (manual or programmatic) data-munging. Beyond limitations particular to upload tools themselves, challenges related to mapping existing metadata to the current metadata structures on Commons (templates and categories) is discussed under "Preparing Media Items for Upload" and "Preserving Important Metadata about Media Items".
- Size and time constraints. A number of participants reported frustration with tools that could not accommodate large batches, or large files. These limitations were sometimes not clearly explicated in the tool documentation, resulting in wasted effort and errors. One interview participant expressed frustration that the videos they uploaded were automatically down-sampled during upload, to such an extent that they felt the resulting videos were not high-quality enough to be useful. Others reported that uploading large batches took considerable time (hours, or according to one participant, days). Another time constraint mentioned by several participants was related to the learning curve required for more complex, or less well-documented, tools.
Valuable features of current upload tools
|Participant||Tool||What was the most useful thing about the tool you used?|
|R_eLpYKWSXdeawV7r||PattyPan||"Tool I know best, and it allows manipulation of metadata in excel instead of having to use more complex programs. Allows batch-uploading. Allows manipulation of metadata templates (i.e. tweak Photography template etc.)"|
|R_1IoSpLpNexGrn89||PattyPan||"It is the only tool to combine a spreadsheet with files and the tool works very well. It works with spreadsheets and modifications are possible per file"|
|R_2DOFxMVl3pLOOv1||GLAMwiki Toolset||"The overall number of files to be uploaded by the same institution is rather large (150'000+).
There was know-how within the local Wikimedia community how to use the GLAM-Wiki Toolset."
|R_BxBP7APu2p1wKhb||Upload Wizard||"Easy to use and supports batch fill information."|
|R_2t9watKeP17jk6A||PattyPan||"Easy to use "out-of-the-box", no special processes on Commons needed."|
|R_2Cfx3PSRNZLpX8i||GLAMwiki Toolset||"Once you get to know it, it is a powerful tool that can upload 1000s of images without too many problems and good stability. It takes XML as the ingestion format, which meets the most commonly available data format with the GLAM-institution I'm working for."|
|R_sO67qlEBjWLViN3||GLAMwiki Toolset||"It's the only one I was aware of at the time! I liked that I could do the thing in one big batch as long as I had my flat xml file."|
|R_2TT17URVj5l0EOD||GLAMwiki Toolset||"it was recommended by the community volunteer who helped our institution to set up the upload process. Flexible metadata mapping; feature to save specific mappings for further use."|
|R_WibxEOzi7tLl1st||PattyPan||"Easy to use without any special processes on Commons. Working with spreadsheets that are easy to edit."|
|R_Q0a5JFhF5nAx4Wt||Vicuna||"easy to use and possibility to import and fix metadata off line. Geotagging tool."|
|R_6s4jWKOBeeKv5xn||GLAMwiki toolset||"I was adding large numbers of files at a time, and it was manageable to provide the metadata as one XML file. Having a test server to do a dummy run."|
|R_PY9P5XRK95HW8rD||Upload Wizard||"Provided at starting screen. Easy instructions for understanding process and licensing."|
Participants also noted when tools performed well. PattyPan was called out frequently for it's ease of use. Participants particularly liked that structuring content and metadata for upload could be done in spreadsheets, a format with which many less-technical participants seemed more comfortable. GLAMwiki Toolset was praised for its power and flexibility.
- Familiar data formats and tools for data preparation and input. Participants appreciated that PattyPan used .CSV formatted files as input, allowing users to prepare their files and metadata using familiar spreadsheet programs such as Microsoft Excel and LibreOffice.
- Good quality end user documentation. Participants appreciated the step-by-step documentation that Upload Wizard provided, particularly the way the documentation described basic copyright and licensing considerations.
- Flexible and powerful metadata mapping. Some participants appreciated the ability to specify templates and categories for their media items before upload (reducing the amount of manual curation necessary after upload). This feature was noted primarily by participants with a deeper understanding of how Commons categories and templates worked, and/or participants working on GLAM projects where there was relatively extensive metadata available about each media item.
- Batch metadata processing. Participants appreciated the ability to perform batch operations on the same piece of metadata across all media items in a collection. Especially in upload batches of hundreds or thousands of items, this feature (which is present to varying degrees across almost all batch upload tools) was seen as a vital time-saver.