CopyPatrol

From Meta, a Wikimedia project coordination wiki
Shortcut:
Meta:CP
About
Screenshot showing basic interface

CopyPatrol is a tool that allows you to see recent Wikipedia edits that are flagged as possible copyright violations. CopyPatrol was developed and is maintained by the Wikimedia Foundation's Community Tech team.

When text is added to Wikipedia, a bot compares the edit against a plagiarism detection service called Turnitin. CopyPatrol helps the patrolling community to identify, evaluate and fix violations, to ensure that all content is freely distributable, and to avoid legal ramifications associated with using unlicensed content.

The tool currently supports the English, Arabic, French, Spanish, and Simple English Wikipedia projects. See task T141379 for discussion of support for other languages.

 

Usage

In order to use CopyPatrol, you will first need to log in. Browse to CopyPatrol and click on "Login" at the top-right. If you were already logged into a Wikimedia account the page should refresh and show your username in place of the Login button.

Each row in the CopyPatrol interface represents an edit or article creation.

Features[edit]

Screenshot showing comparison interface (at bottom)
  • Compare panel – This will open the revision of the article at the time the edit was made and compare it with the given source. In most cases this is enough to determine if the content was directly copied, and can also let you see if the edit was a properly attributed quote, etc, without having to leave the CopyPatrol interface.
  • Percentages – This indicates the number of words that matched the source at the time the edit was made. If you see a high percentage, this means most of the edit was copied from the given source.
  • iThenticate report – If the compare tool does not match up with the percentage or word count, this probably means the source has changed (such as a news feed, for example). You can use the iThenticate report to see the state of the source at the time the edit was made.
  • ORES score – For some edits you may see an ORES Score below the "diff" link. This indicates high confidence that the edit will need to be reverted. The higher the score, the higher the confidence.
  • WikiProjects – If you are interested in tracking violations for one or more WikiProjects, you can do so using the WikiProject search at the top. You can also click on a WikiProject bubble for any record in the interface and that will show you all open cases for that WikiProject.
  • Filters – As a reviewer, you'll want to keep this set to "Open cases", but for historical reference you can see all the reviewed cases, and search by username or page title.
  • Drafts – You can use the Draft checkbox to show only potential violations in the Draft namespace, if it exists on your wiki. This is especially helpful for editors who work at Articles for Creation, as you may be able to quickly reduce the backlog by identifying obvious declines that are copyright violations.
  • Leaderboard – This lets you see the top reviewers in the past week, month and all time. Help review more cases to achieve your place on the leaderboard! One exists for each language.
  • Permalinks - Allow the possibility to share directly to an entry by a direct link, like copypatrol/fr?id=25727. This URL would show only the entry specified by the ID.

Identifying copyright violations[edit]

Once you've verified the content matches the source, check the license of the source and ensure it is compatible with Wikipedia. If no license is provided, you should assume it is not freely licensed.

False positives[edit]

Sometimes edits may appear to be copyright violations when they are not. These are called false positives and can be in various forms:

  • Properly attributed copying of licensed or public domain material
  • Properly attributed quotations
  • Restoring material from a previous revision of the same article
  • Addition of lists, timelines, math formulas and bibliographies
  • Moving material around within the same article

Please consider these possibilities when reviewing edits.

Backwards copies[edit]

Note also the possibility of backwards copies, where the source appeared to copy content from Wikipedia. This is not necessarily a false positive, and may mean the editor copied content within Wikipedia. In this case, the edit should be properly attributed, generally with an edit summary "content copied from ​[[article name]]​". If no attribution is given, it can be rectified with a dummy edit stating the previously added content was copied from another article, among other methods.

Allowlists[edit]

  • URL blacklist – URLs listed here will be removed from copy and paste concerns
  • User whitelist (enwiki) – Filters out trusted users who frequently show up as false positives

Instructions[edit]

On enwiki[edit]

  • Assess the edit and determine whether or not a copyright violation has occurred. If it is a copyright violation, remove the copyrighted content or tag it for speedy deletion with {{db-g12|url=sourceurl}} if it is a new page that is predominantly a violation.
  • Notify the editor of copyright policy using {{uw-copyright}}{{uw-copyright-new}}, or a personalized message. If you discover unattributed copying within Wikipedia, the editor should be notified of our attribution requirement using the template {{uw-copying}}. If you discover unattributed copying from public domain materials, notify the user using the template {{uw-plagiarism}}. All of these templates need to be substituted. Please consider adding the required attribution to the article yourself, as a new editor is unlikely to know how to do it.
  • If you are an administrator, please consider revision deleting the related diffs under criterion RD1 if they qualify. If you are not an administrator you can request revision deletion using {{copyvio-revdel}}.
  • Finally, please update the status in the interface. If you fixed the problem, tagged the page for revision deletion, or tagged the page for deletion as a copyright violation, mark it as "Page fixed". (Please consider watch-listing pages you nominate for deletion or tag for revision deletion to ensure they get resolved to your satisfaction.) If you find the item is a false positive or if the page has already been deleted for reasons other than copyright violation, mark it as "No action needed".

Requesting new wikis[edit]

If you would like your wiki to be added, please first make sure iThenticate supports your language. Look for the "Which international languages does iThenticate have content for in its database?" section in the "General" section of the FAQ.

Then please get community support from your wiki. If the community agrees, create a Phabricator task linking to the discussion. There should be evidence that the community will regularly make use of CopyPatrol.

 

Credits

CopyPatrol is made possible by:

  • Community Tech in developing the web interface
  • ערן who developed EranBot that analyzes recent changes and stores potential violations in a database
  • iThenticate, a plagiarism detection service that EranBot uses to identify violations in recent changes
  • The Earwig who developed Copyvios, a tool used to visualize differences between the edit and a source elsewhere on the internet
  • MusikBot that detects and stores the WikiProjects for the pages in the CopyPatrol feed
  • ORES MediaWiki extension that shows revision scores to indicate high-confidence violations or otherwise disruptive edits
  • Diannaa, Doc James, Sphilbrick and other power users who have provided valuable feedback (see leaderboards for each language on the tool)
  • The translation contributors at translatewiki.net (see Translating:CopyPatrol for more)