Talk:CopyPatrol

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
This page is for discussions related to the CopyPatrol page.

  Please remember to:


Wikimedia Community Logo.svg
NOTE: This page may not be regularly checked. If you need prompt attention from the maintainers please ping a member of Community Tech.


CopyPatrol not collecting records currently[edit]

Hi all, as you might have noticed CopyPatrol hasn't had any new records in the last day. CopyPatrol makes use of the data Eranbot collects for displaying records. Eranbot in turn makes use of a service called Turnitin for finding copyright violations in edits. We had a limit quota with this service which has apparently exceeded. We're working on getting this fixed and will keep you posted. You can follow updates on this on task T185163. -- NKohli (WMF) (talk) 22:59, 18 January 2018 (UTC)

This has been resolved now. Thanks everyone! -- NKohli (WMF) (talk) 19:01, 19 January 2018 (UTC)

Which wiki has the worst amount copyvios?[edit]

I've been taking a look at the Spanish and French copyPatrols and was wondering if there are any metrics as to which of the 4 wikis is afflicted with the most copyright violations? thanks, enL3X1 ¡‹delayed reaction›¡ 02:27, 14 February 2018 (UTC)

Interesting question! Those metrics are a bit hard to compute because we didn't start patrolling all the wikis at the same time. We started off with English as a pilot and opened up to more wikis, if they wanted to use it. I think we can safely say that from the wikis Copypatrol runs on currently, English has had the most number of copyvios come up. -- NKohli (WMF) (talk) 02:49, 14 February 2018 (UTC)

Bot down?[edit]

Hasn't refreshed in over 90 minutes, only displays one edit. enL3X1 ¡‹delayed reaction›¡ 02:15, 16 March 2018 (UTC)

NVM enL3X1 ¡‹delayed reaction›¡ 12:48, 16 March 2018 (UTC)

An issue triggered by copying within Wikipedia (but failing to note the copy in the edit summary)[edit]

Many editors, even experienced ones, are often unaware that our attribution requirements create a need, in the case an editor copies material from another Wikipedia page, to identify the source of the material. This is typically done with an edit summary and we have outlined best practices at: https://en.wikipedia.org/wiki/Wikipedia:Copying_within_Wikipedia

Not surprisingly, an editor copying from one article to another will trigger a entry at CopyPatrol, not because CopyPatrol is looking for edits which match or closely resemble) material within Wikipedia, but because that material may be picked up by another site. In theory, that other site should note the source of the material and provide attribution, but we all know that many sites copy Wikipedia material (acceptable), fail to attribute it (not acceptable) and may even assert full copyright over it (also unacceptable).

Obviously, in the cases where the copied materiel is properly attributed, volunteers working on CopyPatrol are likely to note the attribution, and not revert as a copyvio. However, it is much better if editors follow the best practices, as I am certain that all editors working in CopyPatrol will look at the edit summary, and the notation that it is an internal copy will forestall the need for a revdel.

So what's the point of this missive?

Would it be possible for the CopyPatrol software, when identifying at edit to article X, check Wikipedia to see if there is similar language at articles OTHER than article X? Including that information may help editors identify the edit as an improper copy within Wikipedia edit. It is still a copyright problem, but we handles such situations differently.

And, as this exchange demonstrates, editors who are unaware of the (admittedly not well-known) requirements sometimes take umbrage. I don't enjoy such interactions, and I am sure the other editor, is less than a happy camper. Identifying such situations in advance would be much better for everyone.--Sphilbrick (talk) 22:58, 8 May 2018 (UTC)

I'd like to try again to see if I can get some feedback on this proposal. Multiple hours have been wasted because an editor copied material from another article, was unaware of the need to include a note in the edit summary, and the material also showed up in an external site which mentioned Wikipedia but then went on to claim full copyright over the content, resulting in a reversion by me, and an understandably angry editor. incident here I think things have been smoothed over but it has necessitated hours of work by too many people. If copy patrol had identified that the material also existed in an existing Wikipedia article other than the one in which the edit is being made, all of this would've been avoided. It sounds easy to me to do this check. Am I missing something?--Sphilbrick (talk) 17:50, 12 June 2018 (UTC)
Hi Sphilbrick. Sorry for the late reply to this. I missed your post earlier. As you know, the data in CopyPatrol comes from Eranbot which (if memory serves me right) maintains a blacklist for sites and mirrors it filters out when detecting copyright violations. It's also possible that this happens by Turnitin itself too. I'd like to ask for User:ערן's input on this. -- NKohli (WMF) (talk) 18:23, 12 June 2018 (UTC)
This is correct - whenever we encounter a new Wikipedia mirror it is good practice to add it to User:EranBot/Copyright/Blacklist so we can skip text from those sites for the next time. The bot doesn't do internal search in Wikipedia itself (yet) for the added text as t is somewhat "heavy" (probably need few search queries for sample sentences from the diff). eranroz (talk) 21:20, 12 June 2018 (UTC)
I understand the need to add true mirrors to the list so they don't generate false positives. I trust it is clear that a single article, or portion of an article copied into a blog post does not belong in the mirror list. I think it is such examples that are causing the problem. I'm motivated to post today because two such examples occurred today, one of which was rather contentious, and the other not, perhaps because it was noticed before too much reaction.
In both cases, it looks like a legitimate copyright issue to the reviewer (me) and to the deleting admin @RHaworth:, and the editor is perplexed because they know nothing about the site for which there is matching text. Obviously, this is a nonproblem if editors know to mention the copy in the edit summary, but many editors are unaware of this requirement, and it isn't obvious how to identify such editors; it could be anyone. That's why I would be happy if CopyPatrol did a search of Wikipedia, and gave us a heads up if there is a close match.--Sphilbrick (talk) 21:48, 9 August 2018 (UTC)

Welcome![edit]

Nice to see some fresh faces in CopyPatrol! Welcome! enL3X1 ¡‹delayed reaction›¡ 02:44, 10 July 2018 (UTC)

Diffs not directing properly[edit]

Noticed this today, many but not all of the diffs in the CPatrol page go to invalid locations. Compare the actual edit [1] with the reported one [2] Crow (talk) 18:42, 12 August 2018 (UTC)