Community Wishlist Survey 2023/Multimedia and Commons/Tool to copy images from archives to Commons with metadata

Random proposal ►◄ Multimedia and Commons The survey has concluded. Here are the results!

Tool to copy images from archives to Commons with metadata

Problem: Many GLAM institutions make images available on their website which can be copies to Commons. These have to manually be download/uploaded one by one and metadata copied across.
Proposed solution: A way for Wikimedia Commons to take a URL and copy the image to Commons with descriptions and relevant metadata, including links back to the source.
Who would benefit: Everyone.
More comments: GLAM institutions like Libraries Tasmania / Tasmanian Archives have thousands of public domain images on their website (example). To add each one manually to Commons would take forever. A tool like this would help users of Wikimedia projects add more media, help GLAM institutions quickly share their own content, and make sharing images more accessible to new comers during training events.
Phabricator tickets: T193526
Proposer: Jimmyjrg (talk) 03:59, 24 January 2023 (UTC)[reply]

Discussion

It looks like the example above is using a catalogue product from SirsiDynix; I've not been able to find any API info. I think one aspect of this proposal is likely to be whether we can build a general-purpose tool that works with many libraries, or a single-purpose tool. For example, many archival catalogue systems support OAI-PMH, so if we built something that worked with that then it'd be perhaps more widely used. For site-specific scraping requests, there's a register of them at commons:Commons:Batch uploading. SWilson (WMF) (talk) 07:09, 24 January 2023 (UTC)[reply]
Yes, I'd like something that adapts to the website/database that is being looked at. Some libraries use Spydus (example: Stonnington), which I think has a public API. Ideally it'd be best if there was some way to have it learn how a website works (the first time you visit you have to manually copy and paste all the information) but then it knows how to do it itself after.--Jimmyjrg (talk) 22:05, 24 January 2023 (UTC)[reply]
@Jimmyjrg Double checking I understand the problem correctly: the proposal is to create a workaround for resources that are available online from institutions that do not have APIs or data dumps that can facilitate sharing data in bulk. Is that correct? __VPoundstone-WMF (talk) 16:55, 26 January 2023 (UTC)[reply]
Yes @VPoundstone-WMF: That’s a good explanation. Basically I’d like something quicker than downloading and uploading everything myself (and copying/inputting metadata) when there’s a few images to move to Commons. Jimmyjrg (talk) 08:00, 27 January 2023 (UTC)[reply]
Without commenting on the specific example above, in my experience, creating a generic tool to reliably scrape random websites with the sort of detail required for Commons is probably technically infeasible. c:Commons:Batch uploading exists for a reason. -FASTILY 22:33, 28 January 2023 (UTC)[reply]
I think, that there is one big problem. And that is, the source data have always different formats, so every time you have to change your program. And that's why there is a service on Commons, which helps with such mass transfers. Just now, I cannot find a link.Juandev (talk) 19:14, 9 February 2023 (UTC)[reply]
I was inspired by the Web2Cit project which can learn to add citations using different formats. But you're right, it's likely more difficult for catalogues of images. Jimmyjrg (talk) 23:24, 21 February 2023 (UTC)[reply]
en:GLAM (cultural heritage), an acronym for galleries, libraries, archives, and museums, the cultural heritage institutions --Error (talk) 15:56, 13 February 2023 (UTC)[reply]

Voting

Support Jensbest (talk) 23:27, 10 February 2023 (UTC)[reply]
Support NMaia (talk) 23:45, 10 February 2023 (UTC)[reply]
Support --NGC 54 (talk｜contribs) 01:46, 11 February 2023 (UTC)[reply]
Support Yeeno (talk) 02:09, 11 February 2023 (UTC)[reply]
Support Muhammed amine benloulou (talk) 16:34, 11 February 2023 (UTC)[reply]
Support This is useful in uploading historical images. Thingofme (talk) 03:38, 12 February 2023 (UTC)[reply]

Support :JarrahTree (talk) 07:25, 12 February 2023 (UTC)[reply]
Neutral - We used to have this, in the form of the GLAMwiki Toolset, which is turned off since a couple of months. There are many lessons to be learned there why that failed so maybe explore those first before building something like it again. Husky (talk) 21:06, 12 February 2023 (UTC)[reply]
Support --Polarlys (talk) 21:25, 15 February 2023 (UTC)[reply]
Support cyrfaw (talk) 12:17, 17 February 2023 (UTC)[reply]
Support Vulcan❯❯❯Sphere! 15:53, 18 February 2023 (UTC)[reply]
Support Albinfo (talk) 21:25, 18 February 2023 (UTC)[reply]
Support Zache (talk) 05:15, 19 February 2023 (UTC)[reply]
Support Hans5958 (talk) 05:36, 20 February 2023 (UTC)[reply]
Support Jetaynz (talk) 01:24, 22 February 2023 (UTC)[reply]
Support Mehdi^Talk 06:46, 22 February 2023 (UTC)[reply]
Support Althair (talk) 04:17, 23 February 2023 (UTC)[reply]
Support Sumwiki (talk) 21:27, 23 February 2023 (UTC)[reply]
Support. —— Eric Liu_（Talk） 03:32, 24 February 2023 (UTC)[reply]