From Meta, a Wikimedia project coordination wiki
NOTE: This page may not be regularly checked. If you need prompt attention from the maintainers please ping a member of Community Tech.

Is that list still working? Cause this came in. Nobody (talk) 09:52, 22 December 2023 (UTC)[reply]

Sometimes things slip through; I don't know why. Diannaa (talk) 15:35, 26 December 2023 (UTC)[reply]

Is copy patrol down?[edit]

only 4 cases going back quite some hours enL3X1 ¡‹delayed reaction›¡ 21:34, 25 December 2023 (UTC)[reply]

I'm not seeing any significant gaps, just a general slowdown. I guess people had something else to do on Christmas Day. Diannaa (talk) 15:34, 26 December 2023 (UTC)[reply]

New CopyPatrol is live[edit]

I'm thrilled to announce the new version of CopyPatrol is now live at All existing links should redirect to the right place. Please join me in thanking @JJMC89 for his tremendous help in this effort. He probably deserves most of the credit here, but certainly all of it for the backend that he completely rewrote from scratch. The new backend should be much more resilient, with the sporadic downtime that we occasionally see hopefully being a thing of the past. In addition, the new frontend offers a number of new features:

  • Significant performance improvements
  • Edit summaries, change tags, and diff sizes
  • "Undo" or "revdel" links for users who have the requisite permissions

One notable change you might see is that the iThenticate reports no longer include the crawl date. Unfortunately this is outside our control. The Turnitin product team has been made aware of this feature request, so we hope it will eventually be reinstated.

Please let myself or JJMC89 know of any issues you see. At the time of writing, the backfill script is still running, so many older reports are missing. They should all be restored in due time. Additionally, we're still ironing out integration with mw:Extension:PageTriage. We'll mark phab:T333724 as resolved once all of the aforementioned has been completed.

This release also marks the conclusion of a formal agreement with Turnitin. This has been in the works since at least May 2022. Turnitin has been kind enough to give us free credits when we need them, but from a legal standpoint nothing solidified our relationship in the past. Now it is set in stone, and we have the reassurance that CopyPatrol is here to thrive for years to come. They were gracious enough to give us quite a bit of credits exceeding our current consumption, so we will soon be exploring adding more languages to CopyPatrol. On the front of negotiations with Turnitin, I'd like to thank @Ocaasi who started the conversations, and more recently my colleagues @SSpalding (WMF) from Legal, @JVargas (WMF) from Partnerships, my manager @KSiebert (WMF), and our new Lead Community Tech Manager @JWheeler-WMF.

Above all, allow me to thank all of you – our users – who are doing the actual work of helping cleanse the wikis of copyright violations. Your tireless efforts are what drove us to reaching this milestone.

Warm regards, MusikAnimal (WMF) (talk) 21:42, 9 April 2024 (UTC)[reply]


Fixed = code updated and confirmed would not show up if rechecked
Wow, I can actually feel everything loading faster (imagine my shock on discovering that marking the status of reports is now near-instant). The new features are great, could I share a little bit of feedback?
  • The undo button is really useful, but its location next to the diff button has led to me now clicking it unintentionally multiple times (maybe it could be moved down)
Other than that, everyone looks good. The leaderboard seems a bit funky, but I imagine that will be fixed with the backfill script. Isochrone (talk) 22:06, 9 April 2024 (UTC)[reply]
It's so awesome to see how this technology and this partnership has evolved and matured. Congrats to everyone who has pushed it so much further!! Ocaasi (talk) 00:13, 10 April 2024 (UTC)[reply]
The new version has many positive changes, such as the quick loading time and the expected reduction in outages. However, on the down side, I see that there's already 212 cases posted for April 10 and there's still three hours to go, so a projected 240 cases to assess in the 24 hour period. Given that most days we only have two people working the queue, this needs to be cut in half if that's possible. It's unrealistic and unstustainable to expect our tiny crew to keep up with the voume otherwise. (I can typically only clear about 20 cases per hour and can only commit to working on this for 3-4 hours per day.) Diannaa (talk) 21:20, 10 April 2024 (UTC)[reply]
Yes, many thanks for the improvements! Very grateful. I agree with Diannaa that we may need some tweaks in terms of what the bot flags as a potential copyright violation as the threshold seems to have been lowered compared to before (one example I mentioned on her talk page was that it now flags cases where someone changes one or two words in a paragraph because it detects a match for the remaining text in the paragraph). Not sure we'll be able to handle the reports otherwise. DanCherek (talk) 22:34, 10 April 2024 (UTC)[reply]
@Diannaa @DanCherek Thanks for all of the feedback! Can you link to specific example(s)? someone changes one or two words in a paragraph because it detects a match for the remaining text in the paragraph – wouldn't that still usually be a copyright violation, or do you mean the source is a backwards copy (in which case it's not a copyvio at all)?
Assuming the cases are still valid, my opinion is that it's perfectly fine to have a backlog. While it's admirable to aim for completeness, you can only volunteer but so much time. If however you're seeing a lot of noise, with backwards copies, or otherwise too many cases that are right on the "borderline", etc., we certainly can work to improve that. MusikAnimal (WMF) (talk) 22:45, 10 April 2024 (UTC)[reply]
I'm seeing a lot of cases like [1], where someone copyedits a paragraph and then it matches the rest of the unchanged text to a backwards copy. We still had to deal with backwards copies in the old CopyPatrol, of course, but so far it feels like a lot more after the update. DanCherek (talk) 22:50, 10 April 2024 (UTC)[reply]
Fixed — JJMC89(T·C) 17:25, 11 April 2024 (UTC)[reply]
This report flags an edit that just cleaned up references with no real new text added. -- Whpq (talk) 22:56, 10 April 2024 (UTC)[reply]
Fixed — JJMC89(T·C) 06:44, 11 April 2024 (UTC) modified 16:19, 11 April 2024 (UTC)[reply]
Due to the large number of Wikipedia mirrors, we will always have false positives. We can waste a lot of valuable time on those cases, attempting to determine who had it first. We do have a whitelist of Wikipedia mirrors but people who don't know Regex are warned not to edit it. Here's a few more false positives of various kinds. I don't know if these are useful examples or not:
  • Here's one where an editor removed multiple occurrences of the word "current" from a list. The list itself is public domain of course.
  • Here's one where an editor moved a paragraph that was reflected in a Wikipedia mirror. The material they added in the same edit is okay to keep.
  • In this one, an editor actually removes text but since IMDb has copied our plot summary at some point, the item gets listed.
  • Here's one that illustrated DanCherek's point: only a few words are added. Purported source: an obvious Wikipedia mirror.
Another suggestion: Perhaps we can somehow teach the system to only show us the most likely cases? Maybe there's a way to reduce the threshold for inclusion, regarding the size of the edit or the amount of the overlap? It's not a question of having a backlog; if we don't reduce the fire hose of incoming cases there will be many that never get assessed at all. Diannaa (talk) 23:25, 10 April 2024 (UTC)[reply]
Fixed first and fourthall. The second link is the same as the first. — JJMC89(T·C) 06:44, 11 April 2024 (UTC) modified 17:25, 11 April 2024 (UTC)[reply]
Sorry about the duplicate link; I am not going to bother to look for the missing example. New comments:
  • Community Tech bot used to remove listings of pages that were already deleted. This doesn't seem to be happening so far: deleted article, deleted draft
  • Cases so far at the halfway point of April 11 are a much more manageable 40, so if tweeks are underway, it's working.
Diannaa (talk) 12:11, 11 April 2024 (UTC)[reply]
Unfortunately I had to revert one of the fixes due to poor performance causing the bot to buildup a large backlog that hasn't been processed yet. — JJMC89(T·C) 16:19, 11 April 2024 (UTC)[reply]
One thing I've noticed is that I keep getting logged out everytime I close my browser-- is there a cookie persistence issue? I had no such issues with the old backend. Isochrone (talk) 13:38, 11 April 2024 (UTC)[reply]
I will look into this. This seems this happens to every new Symfony app that I create (phab:T224382). I managed to fix it before, so I'll attempt it again for CopyPatrol (the old CopyPatrol did not run on Symfony, FYI) MusikAnimal (WMF) (talk) 19:32, 11 April 2024 (UTC)[reply]
I just noticed that I can't view the iThenticate reports unless I am logged in to CopyPatrol. So that might be a feature rather than a bug. Diannaa (talk) 23:14, 11 April 2024 (UTC)[reply]
Logging in is required since each user must agree to the EULA to see the reports. The short login session should get worked on. — JJMC89(T·C) 22:25, 12 April 2024 (UTC)[reply]
Tracked in Phabricator:
Task T362457 resolved

New feedback: Some users are incorrectly being shown with redlinked user talk pages. Here, here, here, for example. It appears this might be because they don't have a talk page on Meta, but that's immaterial; I would prefer to be able to see at a glance whether or not a user talk page exists at for that username. Diannaa (talk) 21:44, 12 April 2024 (UTC)[reply]

Fixed MusikAnimal (WMF) (talk) 19:17, 14 April 2024 (UTC)[reply]

Moving ignore lists to the CopyPatrol UI[edit]

In the above discussion, it was noted how tedious it is to maintain User:CopyPatrolBot/UrlIgnoreList as it requires knowledge of regular expressions. I had an idea that we could get rid of the on-wiki lists and instead have a button "Ignore URLs like this" directly in the CopyPatrol UI. We could do the same for users, too, so you don't have to edit User:CopyPatrolBot/UserIgnoreList. This is also nice because the new system has the ignore lists centralized on Meta, where not everyone is necessarily able to edit (the page could be semi-protected).

The only issue I foresee with this idea is the potential for abuse. For that, I was thinking we'd either restrict the ability to ingore URLs and users to "privileged" users – say at least 1,000 edits, or even restrict to sysops? Another option is to go ahead and shield all of CopyPatrol from newbies, as proposed at phab:T178700.

Thoughts? MusikAnimal (WMF) (talk) 19:43, 11 April 2024 (UTC)[reply]

I can't imagine any issues with this for URLs. With users, making it too easy, even for admins (who are humans), to exclude users may lead to unintentional removals of users who should be flagged, or people being too liberal with the ignore button.
When there are errors on the wikitext list, this can just be rectified by another user: would there be a way to "un-ignore" users in case of errors? Isochrone (talk) 20:28, 11 April 2024 (UTC)[reply]
I think it makes sense to have an interface to manage the ignored URLs and users. MusikAnimal (WMF) (talk) 22:43, 14 April 2024 (UTC)[reply]

CopyPatrol has stopped, but..[edit]

CopyPatrol has stopped, because Turnitin is down for maintenance. Check for updates. Diannaa (talk) 19:36, 20 April 2024 (UTC)[reply]