Talk:CopyPatrol

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
This page is for discussions related to the CopyPatrol page.

  Please remember to:


Wikimedia Community Logo.svg
NOTE: This page may not be regularly checked. If you need prompt attention from the maintainers please ping a member of Community Tech.


Bot down?[edit]

Hasn't refreshed in over 90 minutes, only displays one edit. enL3X1 ¡‹delayed reaction›¡ 02:15, 16 March 2018 (UTC)

NVM enL3X1 ¡‹delayed reaction›¡ 12:48, 16 March 2018 (UTC)

An issue triggered by copying within Wikipedia (but failing to note the copy in the edit summary)[edit]

Many editors, even experienced ones, are often unaware that our attribution requirements create a need, in the case an editor copies material from another Wikipedia page, to identify the source of the material. This is typically done with an edit summary and we have outlined best practices at: https://en.wikipedia.org/wiki/Wikipedia:Copying_within_Wikipedia

Not surprisingly, an editor copying from one article to another will trigger a entry at CopyPatrol, not because CopyPatrol is looking for edits which match or closely resemble) material within Wikipedia, but because that material may be picked up by another site. In theory, that other site should note the source of the material and provide attribution, but we all know that many sites copy Wikipedia material (acceptable), fail to attribute it (not acceptable) and may even assert full copyright over it (also unacceptable).

Obviously, in the cases where the copied materiel is properly attributed, volunteers working on CopyPatrol are likely to note the attribution, and not revert as a copyvio. However, it is much better if editors follow the best practices, as I am certain that all editors working in CopyPatrol will look at the edit summary, and the notation that it is an internal copy will forestall the need for a revdel.

So what's the point of this missive?

Would it be possible for the CopyPatrol software, when identifying at edit to article X, check Wikipedia to see if there is similar language at articles OTHER than article X? Including that information may help editors identify the edit as an improper copy within Wikipedia edit. It is still a copyright problem, but we handles such situations differently.

And, as this exchange demonstrates, editors who are unaware of the (admittedly not well-known) requirements sometimes take umbrage. I don't enjoy such interactions, and I am sure the other editor, is less than a happy camper. Identifying such situations in advance would be much better for everyone.--Sphilbrick (talk) 22:58, 8 May 2018 (UTC)

I'd like to try again to see if I can get some feedback on this proposal. Multiple hours have been wasted because an editor copied material from another article, was unaware of the need to include a note in the edit summary, and the material also showed up in an external site which mentioned Wikipedia but then went on to claim full copyright over the content, resulting in a reversion by me, and an understandably angry editor. incident here I think things have been smoothed over but it has necessitated hours of work by too many people. If copy patrol had identified that the material also existed in an existing Wikipedia article other than the one in which the edit is being made, all of this would've been avoided. It sounds easy to me to do this check. Am I missing something?--Sphilbrick (talk) 17:50, 12 June 2018 (UTC)
Hi Sphilbrick. Sorry for the late reply to this. I missed your post earlier. As you know, the data in CopyPatrol comes from Eranbot which (if memory serves me right) maintains a blacklist for sites and mirrors it filters out when detecting copyright violations. It's also possible that this happens by Turnitin itself too. I'd like to ask for User:ערן's input on this. -- NKohli (WMF) (talk) 18:23, 12 June 2018 (UTC)
This is correct - whenever we encounter a new Wikipedia mirror it is good practice to add it to User:EranBot/Copyright/Blacklist so we can skip text from those sites for the next time. The bot doesn't do internal search in Wikipedia itself (yet) for the added text as t is somewhat "heavy" (probably need few search queries for sample sentences from the diff). eranroz (talk) 21:20, 12 June 2018 (UTC)
I understand the need to add true mirrors to the list so they don't generate false positives. I trust it is clear that a single article, or portion of an article copied into a blog post does not belong in the mirror list. I think it is such examples that are causing the problem. I'm motivated to post today because two such examples occurred today, one of which was rather contentious, and the other not, perhaps because it was noticed before too much reaction.
In both cases, it looks like a legitimate copyright issue to the reviewer (me) and to the deleting admin @RHaworth:, and the editor is perplexed because they know nothing about the site for which there is matching text. Obviously, this is a nonproblem if editors know to mention the copy in the edit summary, but many editors are unaware of this requirement, and it isn't obvious how to identify such editors; it could be anyone. That's why I would be happy if CopyPatrol did a search of Wikipedia, and gave us a heads up if there is a close match.--Sphilbrick (talk) 21:48, 9 August 2018 (UTC)
  • @NKohli (WMF):--In case you missed it, Would it be possible for the CopyPatrol software, when identifying at edit to article X, check Wikipedia to see if there is similar language at articles OTHER than article X?.Winged Blades of Godric (talk) 09:48, 1 October 2018 (UTC)

Welcome![edit]

Nice to see some new faces in CopyPatrol! Welcome! enL3X1 ¡‹delayed reaction›¡ 02:44, 10 July 2018 (UTC) enL3X1 ¡‹delayed reaction›¡ 01:07, 16 September 2018 (UTC)

Diffs not directing properly[edit]

Noticed this today, many but not all of the diffs in the CPatrol page go to invalid locations. Compare the actual edit [1] with the reported one [2] Crow (talk) 18:42, 12 August 2018 (UTC)

Chiming in with the same observation. I saw this popping up a couple weeks ago, hoped it was a momentary glitch, but it doesn't seem to be going away. On the one hand it isn't a showstopper — one can click on the view history go back and check the time and date of the edit find the edit and then track down the diff, but when one is trying to handled dozens each day, adding several extra steps is annoying. Here are three examples I came across to the last hour:
[3]
[4]
[5]--Sphilbrick (talk) 14:22, 20 August 2018 (UTC)
This is phab:T201218. It should at least be showing the content of the right revision, but the "change visibility" link (if you are an admin) is not available because MediaWiki thinks it's an invalid revision. I'm not sure if we should wait for a fix or change our our links to point oldid=123 instead of diff=123. For revisions that aren't the first to the page, using diff=123 is preferred so you can easily see the content added with the edit in question MusikAnimal (WMF) (talk) 15:17, 20 August 2018 (UTC)
Thanks for the prompt response. Happy to see that it is being tracked and investigated. I do agree that it seems to be showing the right content.--Sphilbrick (talk) 17:33, 20 August 2018 (UTC)

Small queue today[edit]

Less than a dozen tickets the 4 or 5 times I checked today. Is something broken? enL3X1 ¡‹delayed reaction›¡ 01:49, 2 October 2018 (UTC)

I haven't heard, noticed the same thing, but it seems back in action.--Sphilbrick (talk) 13:38, 2 October 2018 (UTC)
Yup, back to full capacity on my end again, thanks all. enL3X1 ¡‹delayed reaction›¡ 14:39, 2 October 2018 (UTC)
There was a hiccup with our account on Turnitin expiring but thanks to eranroz and Doc James, it was quickly restored. :) -- NKohli (WMF) (talk) 15:45, 2 October 2018 (UTC)
Yes will work to say on top of this. Ithentica / Turnitin has been an amazing partner and has been very generous in giving us access to large number of queries on their API. Doc James (talk · contribs · email) 17:36, 2 October 2018 (UTC)