Community Wishlist Survey 2022/Bots and gadgets/Tool that reviews new uploads for potential copyright violations
Tool that reviews new uploads for potential copyright violations
- Problem: There are so many uploads on Wikimedia Commons that are copyright violations. Some of them are snapshots of some computer screen or some other image like a poster. I estimate that about 3-6% of all files on Commons (most of them probably images) are copyright violations. I already had some files around the time when Commons started that were copyright violations and nobody ever noticed.
- Proposed solution: A bot or tool that checks uploaded files and flags them if they are potentially copyright violations
- Who would benefit: Wikimedia Commons as such as well as admins
- More comments:
- Phabricator tickets: phab:T120453
- Proposer: --D-Kuru (talk) 14:00, 18 January 2022 (UTC)
Discussion
In my opinion copyright violations share one or more of these features:
- EXIF data: Copyvios usually do not have EXIF info. Not said that they can not include them.
- Size: Copyvios are usually small in size (usually websize like around 800 px for the long length). Not said that they can't be larger.
- Account edits: Copyvios are often uploaded by what I call hit and run accounts. So the account is created, uploads one copyvio and is never to be used ever again.
- Can be found in the web: Copyvios can often be found on other websites. Not said that they can not be freely shared by other people on sites like flickr.
- Author: No matter how impossible it is, the file is tagged with "own work" and the upload as the author name
- Percentage of useruploads: When a hit and run account uploads three images and two of them are speedy deleted because of copyright violation, the third one should may be checked.
- Content: Copyvios usually look better than the average image on Wikimedia Commons. All other parts are probably possible to check rather easy. But this one would need a highly specialised AI that can tell good from bad quality. For a project like Wikimedia I guess this is next to impossible if nobody like Google steps in to help out.
There are also hints that indicate a file is a legit one: GPS data (not possible for every file), upload by a user with many edits over many years, user who has a global account with more profiles on different projects, etc.
My suggestion is NOT to have a bot run wild and delete files it deems as copyvios, but to have a bot that assigns a (public?) score to an image and checks how likely it is that some file could be a potential copyvio so that less of them slipp by.
I thought about creating something like a Trusted Author who has to meets certain criteria to ensure that people can use uploads by this user. There is a tiny story behind this: I was asked by somebody via Mail if they can use my image in a school book and what they would have to do to use it. I said of course and told them all they would have to do is to credit me and the licence. They ended up not using my image (even I own the copyright and have it licenced under a free licence) because they can't be sure for 100% and they could potentially get into legal troubles if they use the image without having a real permission. This might not be a big deal for a website, but can be a huge deal for a (school) book that is printed and sold over many years. --D-Kuru (talk) 20:21, 20 January 2022 (UTC)
- @D-Kuru: This is a valid proposal, but it's a bit wordy. phab:T120453 is basically what you're asking for (a bot to flag Commons uploads that are possible copyvios). Is that correct? If so, with your permission, may I simplify the wording of your proposal? It will eventually be marked for translation, so putting in fewer words will make it easier on the translators. For instance, the symptoms you mention in "More comments" such as looking at EXIF data, size, etc., are helpful but not really necessary to understanding the wish. We could move that here to the discussion section. Thanks, MusikAnimal (WMF) (talk) 19:06, 20 January 2022 (UTC)
- @MusikAnimal (WMF): I did not have the time to check the ticket, but from your description this sounds about right. I moved the More comments section as suggested. --D-Kuru (talk) 20:21, 20 January 2022 (UTC)
- Ok thanks. I have done some slight rewording of your proposal for better translatability, which I hope is okay :) Best, MusikAnimal (WMF) (talk) 22:07, 20 January 2022 (UTC)
- @MusikAnimal (WMF): I did not have the time to check the ticket, but from your description this sounds about right. I moved the More comments section as suggested. --D-Kuru (talk) 20:21, 20 January 2022 (UTC)
- Files are also legit ones if there is a VRT ticket or when the VRT request is pending, please ignore them in this gadget (perhaps obvious, but let's not forget it). --JopkeB (talk) 04:53, 29 January 2022 (UTC)
- Note that such a tool can list the images corresponding of the features listed above in galleries such as it was done by c:User:OgreBot/Uploads by new users that stopped and was very useful, as an administrator I found a lot of content to be deleted tanks to those galleries. A pity it have stopped. Christian Ferrer (talk) 12:17, 30 January 2022 (UTC)
- This had a lot of support, so I have made a bot solution that can be expanded that should fulfil some of the needs of this wishlist. You can read the bot request here: https://commons.wikimedia.org/wiki/Commons:Bots/Requests/CommCheck Ed6767 (talk) 19:21, 9 February 2022 (UTC)
I wonder if calculating hashes on uploaded files could help... there's not likely a database we could bump against, but we could probably track hashes of uploaded files over time, and if a file with the same hash is uploaded again, that could be one factor into such a score. Alternatively, it could help indicate if the file exists already, under a different name? = paul2520 (talk) 19:37, 5 February 2022 (UTC)
- I would suggest on upload always use the "This file is not my own work." -upload form. If it is own work, they should declare that they are the author. Many people just use the simpler form, which leads to copyvio. If they declare author and source it is easier to detect copyvios, than images wrongly taged as own work. — Johannes Kalliauer - Talk | Contributions 18:41, 16 February 2022 (UTC)
Voting
- Support Not violating copyright is important, not just morally and legally, but also since it's disruptive to have to swap out copyvio images in articles and to wade through copyvios when searching for images to add. {{u|Sdkb}} talk 18:56, 28 January 2022 (UTC)
- Support MaRayneS (talk) 19:12, 28 January 2022 (UTC)
- Support --Arnd (talk) 19:44, 28 January 2022 (UTC)
- Support Corn cheese (talk) 20:34, 28 January 2022 (UTC)
- Support Strainu (talk) 20:41, 28 January 2022 (UTC)
- Support Wostr (talk) 20:48, 28 January 2022 (UTC)
- Support. These are well thought out criteria that match with my experience, especially "own work" combined with "found on the web". That combination is a red flag. Jonesey95 (talk) 22:13, 28 January 2022 (UTC)
- Support a feature similar to a database report, and per Sdkb. I would also suggest this on all wikis; not just Commons, as for example enwiki also gets many vios. EpicPupper (talk) 22:37, 28 January 2022 (UTC)
- Support Daud Iffa (talk) 23:41, 28 January 2022 (UTC)
- Support Klein Muçi (talk) 00:40, 29 January 2022 (UTC)
- Support --𝑇𝑚𝑣 (𝑡𝑎𝑙𝑘) 01:02, 29 January 2022 (UTC)
- Support Eviolite (talk) 01:47, 29 January 2022 (UTC)
- Support Shizhao (talk) 03:41, 29 January 2022 (UTC)
- Support JopkeB (talk) 04:45, 29 January 2022 (UTC)
- Support Gbawden (talk) 05:31, 29 January 2022 (UTC)
- Support Ottawajin (talk) 05:39, 29 January 2022 (UTC)
- Support 3aFW (talk) 05:54, 29 January 2022 (UTC)
- Support --Флаттершай (talk) 06:32, 29 January 2022 (UTC)
- Support -- ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 07:48, 29 January 2022 (UTC)
- Support ♥Ainali talkcontributions 08:48, 29 January 2022 (UTC)
- Support Still expecting volunteers to check thousands of thousands of uploads to Commons by hand in 2022 is just cruel, honestly. Some systems should be in place. Meiræ 11:13, 29 January 2022 (UTC)
- Support Aca (talk) 12:27, 29 January 2022 (UTC)
- Support rubin16 (talk) 15:51, 29 January 2022 (UTC)
- Support HLFan (talk) 15:57, 29 January 2022 (UTC)
- Support Glenn984 (talk) 22:43, 29 January 2022 (UTC)
- Support M.nelson (talk) 23:57, 29 January 2022 (UTC)
- Support 𝗩𝗶𝗸𝗶𝗽𝗼𝗹𝗶𝗺𝗲𝗿 ℣ 23:58, 29 January 2022 (UTC)
- Support Thingofme (talk) 02:18, 30 January 2022 (UTC)
- Support Tonnegrande (talk) 05:17, 30 January 2022 (UTC)
- Support Rohalamin (talk) 06:29, 30 January 2022 (UTC)
- Support Lectrician1 (talk) 07:26, 30 January 2022 (UTC)
- Support TheInternetGnome (talk) 07:29, 30 January 2022 (UTC)
- Support —— Eric Liu(Talk) 07:56, 30 January 2022 (UTC)
- Support F. Riedelio (talk) 10:50, 30 January 2022 (UTC)
- Support Yes please, very important to asses this at the source. I would like to see a warning too for the uploader just before they push the final button "upload". Ellywa (talk) 11:25, 30 January 2022 (UTC)
- Support Christian Ferrer (talk) 12:19, 30 January 2022 (UTC)
- Support Geraki TL 14:18, 30 January 2022 (UTC)
- Support HynekJanac (talk) 17:37, 30 January 2022 (UTC)
- Support Jmaxx37 (talk) 18:41, 30 January 2022 (UTC)
- Support Daniuu (talk) 23:53, 30 January 2022 (UTC)
- Support Daniel Case (talk) 05:05, 31 January 2022 (UTC)
- Support Lfstevens (talk) 05:51, 31 January 2022 (UTC)
- Support Trizek from FR 12:00, 31 January 2022 (UTC)
- Support Sadads (talk) 15:51, 31 January 2022 (UTC)
- Support Bencemac (talk) 17:58, 31 January 2022 (UTC)
- Support IOIOI (talk) 20:37, 31 January 2022 (UTC)
- Support Dave Braunschweig (talk) 22:19, 31 January 2022 (UTC)
- Support Trey314159 (talk) 22:29, 31 January 2022 (UTC)
- Support Shooterwalker (talk) 22:34, 31 January 2022 (UTC)
- Support Alain Artivalys (talk) 13:11, 1 February 2022 (UTC)
- Support Ju Mdz (talk) 16:11, 1 February 2022 (UTC)
- Support Hwqaksd (talk) 19:24, 1 February 2022 (UTC)
- Support
Alternatively, delete Commons.~~~~
User:1234qwer1234qwer4 (talk) 22:07, 1 February 2022 (UTC) - Support — JJMC89 (T·C) 02:15, 2 February 2022 (UTC)
- Support seems reasonable Paradise Chronicle (talk) 04:58, 2 February 2022 (UTC)
- Support Ifrit.eu (talk) 11:11, 2 February 2022 (UTC)
- Support Marco3399 (talk) 15:17, 2 February 2022 (UTC)
- Support Rdrozd (talk) 17:52, 2 February 2022 (UTC)
- Support Lupe (talk) 19:17, 2 February 2022 (UTC)
- Support ~ Amory (u • t • c) 20:38, 2 February 2022 (UTC)
- Support HouseBlaster (talk) 01:05, 3 February 2022 (UTC)
- Support DanCherek (talk) 03:04, 3 February 2022 (UTC)
- Support EN-Jungwon 03:22, 3 February 2022 (UTC)
- Support Paucabot (talk) 06:16, 3 February 2022 (UTC)
- Support WikiAviator (talk) 09:54, 3 February 2022 (UTC)
- Support And maybe the tool can have a list of pages that are arr so it can flag those with it as a link to the source (not counting fair use on enwiki) Leomk0403 (talk) 11:43, 3 February 2022 (UTC)
- Support Whisperjanes (talk) 15:28, 4 February 2022 (UTC)
- Support Yeeno (talk) 20:30, 4 February 2022 (UTC)
- Support Pi.1415926535 (talk) 21:26, 4 February 2022 (UTC)
- Support Voice of Clam (talk) 10:47, 5 February 2022 (UTC)
- Support SD hehua (talk) 15:06, 5 February 2022 (UTC)
- Support HHill (talk) 15:10, 5 February 2022 (UTC)
- Support paul2520 (talk) 19:36, 5 February 2022 (UTC)
- Support —Thanks for the fish! talk•contrib (he/him) 21:23, 5 February 2022 (UTC)
- Support Vulp❯❯❯here! 03:48, 6 February 2022 (UTC)
- Support Ayumu Ozaki (talk) 05:06, 6 February 2022 (UTC)
- Support Michael Barera (talk) 06:10, 6 February 2022 (UTC)
- Support Redalert2fan (talk) 14:44, 6 February 2022 (UTC)
- Support Fiver, der Hellseher (talk) 19:41, 6 February 2022 (UTC)
- Support Bas dehaan (talk) 23:02, 6 February 2022 (UTC)
- Support Eric0892 (talk) 02:16, 7 February 2022 (UTC)
- Support LRFtheLion (talk) 02:20, 7 February 2022 (UTC)
- Support //Lollipoplollipoplollipop::talk 11:24, 7 February 2022 (UTC)
- Support Ryse93 (talk) 12:22, 7 February 2022 (UTC)
- Support ~Cybularny Speak? 20:10, 7 February 2022 (UTC)
- Support — Bilorv (talk) 11:55, 9 February 2022 (UTC)
- Support Prawdziwy Mikołajek (talk) 17:37, 9 February 2022 (UTC)
- Support Ecritures (talk) 22:42, 9 February 2022 (UTC)
- Support Miaow (talk) 13:08, 10 February 2022 (UTC)
- Support This is so needed 4nn1l2 (talk) 13:24, 10 February 2022 (UTC)
- Support --Túrelio (talk) 13:30, 10 February 2022 (UTC)
- Support Wouterhagens (talk) 13:58, 10 February 2022 (UTC)
- Support MdsShakil (talk) 14:04, 10 February 2022 (UTC)
- Support Hulged (talk) 15:45, 10 February 2022 (UTC)
- Support - FitIndia Talk (A/CU) on Commons 16:07, 10 February 2022 (UTC)
- Support CptViraj (talk) 16:13, 10 February 2022 (UTC)
- Support --Dyolf77 (talk) 16:14, 10 February 2022 (UTC)
- Support It's important therefore i support Aliyu shaba]]Talk 16:16, 10 February 2022 (UTC)
- Support --A.Savin (talk) 16:23, 10 February 2022 (UTC)
- Support Bluerasberry (talk) 16:52, 10 February 2022 (UTC)
- Support -- Regards, ZI Jony (Talk) 16:53, 10 February 2022 (UTC)
- Support ZellmerLP (talk) 19:56, 10 February 2022 (UTC)
- Support Le Loy 02:19, 11 February 2022 (UTC)
- Support Qwerfjkl (talk) 15:01, 11 February 2022 (UTC)
- Support Forrestkirby (talk) 15:29, 11 February 2022 (UTC)
- Support --evrifaessa ❯❯❯ talk 15:50, 11 February 2022 (UTC)
- Support -BRAINULATOR9 (TALK) 17:14, 11 February 2022 (UTC)
- Support Novak Watchmen (talk) 17:48, 11 February 2022 (UTC)