Requests for comment/About copyright and possible infringements of it in WMF wikis

From Meta, a Wikimedia project coordination wiki

This is a subpage; for more information, see the Requests for comments page.


This RFC is about principles. It is not about guilt, particular wikis, even less about particular users. I deliberately avoid mentioning individual wikis or incidents here. To discuss such cases please go to some other RFC, or create a new one.

The reason why I created this RFC is that some wikis (at least one, probably more, and YES I am sure) have substantial trouble with copyrights (or piracy, plagiarism, etc). Some of those cases have been undetected, ongoig or open for more than 10 years. Most of the "imaginable" problems outlined below have indeed occurred, thus this is a real issue, not hypothetical, unnecessary or unrealistic fears.

The expected outcome is to get clear answers to what is permitted and what isn't (where the border between inpiration and copying is, where the border between legitimate quoting and piracy is, where the border between coincidence and plagiarism lies), and how possible copyright violations are to be resolved minimizing conflicts, waste of human resources, and legal risks. Maybe creating a page here on meta clearly stating the facts and debunking the myths about copyright would be good. A clarifying answer from some authority (ie, stewards of WMF) would be appreciated.

The base for all this is the law, WikiMedia TOS, common practice on-wiki and off-wiki, and the 2 licenses CC BY-SA 3.0 and GFDL.

When submitting the edit form, one agrees to:

> By saving changes, you agree to the Terms of Use, and you irrevocably agree to release
> your contribution under the CC BY-SA 3.0 License and the GFDL. You agree that a hyperlink
> or URL is sufficient attribution under the Creative Commons license.

The TOS says:

https://foundation.wikimedia.org/wiki/Terms_of_Use/en#4._Refraining_from_Certain_Activities

> 4. Refraining from Certain Activities
> Committing Infringement
> Infringing copyrights, trademarks, patents, or other proprietary rights

https://foundation.wikimedia.org/wiki/Terms_of_Use/en#7._Licensing_of_Content

> Text to which you hold the copyright: When you submit text to which you hold the copyright, you agree to license it under:
> Creative Commons Attribution-ShareAlike 3.0 Unported License (“CC BY-SA”), and
> GNU Free Documentation License (“GFDL”) (unversioned, with no invariant sections, front-cover texts, or back-cover texts).

...

> When you contribute content that is in the public domain, you warrant that the material is actually
> in the public domain, and you agree to label it appropriately.

But what does it mean in practice, and where to complain if violations become the rule?

0. May one copy from one WMF wiki to another with attribution? IMHO YES, if edit summary contains a link to the page and version used as source.

1. May one copy from one WMF wiki to another without attribution? IMHO NO, but this occurs frequently. But how to resolve such problems, particularly if a long time has gone, and it is difficult to find the source? And, IMHO providing a valid attribution is a duty of the person who inserted the content, and should not be exported to others. Someone claiming a "highly positive edit" with a substantial size increase by copying without attribution IMHO should be classified as sort of fraud or theft. What to do with such pages, particularly if they are many, and other people or bots have edited them since? Delete? Revert with revision suppression? Add valid attribution? Nothing?

2. May one copy public domain content into a WMF wiki? IMHO YES, if attribution and source are given. IMHO the duty to provide attribution, source, and at least circumstantial evidence that the material was public domain should be satisfied by the person who inserts such content, but there does not seem to exist such a rule. Exporting the duty to investigate retroactively to others should is IMHO not the way to go, as it can cause legal uncertainty, conflicts and exra workload for others.

3. May one copy content under some other free license (GNU GPL for example, or even CC BY only) into a WMF wiki? IMHO NO, because all content in all WMF wikis (with very few exceptions) is under both the GNU FDL and CC BY. How to resolve such incidents?

4. May one copy content that is presumably copyrigthed into a WMF wiki? IMHO NO, because all content in all WMF wikis (with very few exceptions) is under both the GNU FDL and CC BY. How to resolve such incidents?

5. Frequent argument (affecting particularly wiktionaries): the language belongs to all, it is not copyrighted, and possible same definitions are just coincidence. IMHO the argument is invalid. Indeed a language is inherently public domain, but this does not allow anyone to copy complete lemma entries from a copyrighted dictionary into a wiktionary. If a short definition (say 2 to 4 words) is same, YES, this can be coincidence. The meaning of a word is public domain, and there could be only one obvious way to define a word. But if the definition is longer (say 7 to 12 words), is is very unlikely to get same definition without copying. Even worse, if many lemma articles have identical definitions as some other ("authoritative") dictionary, this is ultimately not coincidence anymore, but piracy. Also, if derivative words including definitions follow, several same words in same order with identical definitions, then the edit must be considered as a crass copyright infringement and nothing else.

6. Frequent argument (affecting particularly wiktionaries): there is no copyright notice, the online dictionary is accessible for anyone, for any purpose. IMHO the argument is invalid. In such cases free personal and educational use can be safely assumed, but ultimately not copying into a wiktionary.

7. Frequent argument (affecting particularly wiktionaries): the copyright notice says "free for personal and educational use" and the online dictionary is accessible for anyone. IMHO the argument is invalid. "free personal and educational use" does NOT include copying into a wiktionary, and doing so violates both the copyright of the creator of the other dictionary and the WMF TOS.

8. Frequent argument: the copyright law is valid in one state only, and I do not live in that state / you do not live in that state / WMF is not located in that state / nobody knows where you live. IMHO the argument is invalid. The WMF TOS is universal, and the content submitted into a WMF wiki must be legally usable by anyone living anywhere. If it is copyrighted somewhere, then it must NOT be copied into any WMF wiki.

9. Frequent argument: you are not a representative of the copyright owner, and the copyright owner has not yet complained. IMHO the argument is invalid. Having pirated content that the copyright owner has not yet discovered or complained about is dangerous. The "feeling safe" can suddently turn into an urgent problem at any time (copyright owner discovers the wiki or changes eir mind, sharpened laws, ...), and this can result in an emergency deletion of the wiki wasting time of all honest contributors.

10. Frequent argument: quoting with referencing to the source is legal. IMHO the argument is invalid. Indeed quoting with referencing to the source is legal, provided that the quotes constitute a small part of the total, and the remaning content is your original work. OTOH copying a complete article or a large part of such (from a dictionary or other source) and adding references to the source at the end of every single copied sentence will NOT remove the problem of piracy, particularly if you apply this "magic trick" on many articles.

11. A (small) wiki is reluctant to copyright violations. What to do? Can a wiki opt-out from copyright? If no, where to complaint?

12. A (small) wiki is full of pirated content. For example in a wiktionary, >=70% of lemma pages in a given language are copied from a copyrighted dictionary. How to resolve, particularly if there are no local sysops, or they are reluctant to the problem, or deny it? Should the stewards of the WMF act in such cases instead of exporting the responsibitity to some (non-existent) "community"? Delete or reset the wiki? Delete individual pages? Nothing? Wait indefinitely?

13. A (small) wiki is full of pirated content. For example in a wiktionary, >=70% of lemma pages in a given language are copied from another wiktionary without attribution. How to resolve, particularly if there are no local sysops, or they are reluctant to the problem, or deny it? Should the stewards of the WMF act in such cases instead of exporting the responsibitity to some (non-existent) "community"? Delete or reset the wiki? Delete individual pages? Nothing? Wait until the community emerges or matures?

14. A local sysop is engaging in copyright infringements and is reluctant to complaints made by ordinary users (or bans those who dare to complain). What to do? Should the stewars of the WMF act in such cases instead of exporting the responsibitity to some (non-existent) "community"?

15. There are pages with illegaly copied content in a wiki and people or bots contribute to such or edit it, making it difficult to separate "good" from "bad" later. What to do? IMHO this should be prevented from the beginning, piracy should be suppressed immediately on all WMF wikis.

16. There are pages with illegaly copied content in a wiki, bad since creation, created by copying. What to do? Delete? IMHO YES. Contributing to pirated pages should be "at your own risk". And even more important, act quicky, preventing this situation from arising at all, or at least preventing accumulating a large quantity of good-faith contributions on the top of piracy.

17. There are pages with illegaly copied content in a wiki, clean when created, by later destroyed by adding material constituting copyright infringments. What to do? Should large-scale adding of objectionable material to existing pages be considered as persistent severe vandalism, and result in desysopping, permanent local block, global community ban, or global office ban? IMHO YES.

18. There were pages with illegaly copied content in a wiki, but someone edited them by removing the objectionable material from the current version. What to do? Is editing sufficient? IMHO NO. Revision suppression at least must be done. But what to do if such pages constitute >=70% of all pages (in a language in a wiktionary)? Imagine a wiktionary where almost every page has a history where a majority of revisions is suppressed. Is this a good advertizing for that wiki, that language, or the WMF? IMHO NOT at all. Who will check all those 1000:s individual pages and perform the revision suppression? Should such a wiki be deleted or reset instead? IMHO ultimately YES. Keep the "obviously good stuff", and discard the dubious mass that undeniably consists mostly of copyright infringments, or trivial bot-created content, as nobody is willing to waste 1000:s hours to separate the "good" from the "bad", or even worse "useless but legal" from "illegal".

19. I find substantial problems with copyright on a wiki. Where to report it?

Proposal new for global rules aiming to minimize the harm caused by copyright infringements on WMF wikis:

  • require that when adding content by copying from other wiki, a link to the source version on the other wiki MUST BE provided in the edit summary
  • require that when adding content by copying from a public domain source, a link to that source and reason why it believed to be public domain (by age, by will of the author, ...) MUST BE provided in the edit summary
  • make it an explicit duty of the stewards to act if copyright infringements are common in a (small) wiki, by warning both all local sysops (at their user talk pages) and the community (at the main discussion page), and take further steps if nothing happens within reasonable time.

Please comment on the 20 questions above, my opinions and guesses (particularly if I am wrong), and the 3 proposals. Taylor 49 (talk) 22:08, 30 November 2021 (UTC)[reply]

Comments[edit]

  • Comments by User:Dave Braunschweig
    • Some of this was already addressed several years ago, but I can't find the source at the moment. There are three accepted ways of referencing a CC-BY-SA source:
      1. A reference within the page text itself.
      2. Edit summary.
      3. Discussion page.
    • A reference within the page text itself should be preferred, as it is the only approach that is displayed when the page content is transcluded or printed. An author's right to be referenced under CC-BY-SA doesn't go away when the page is transcluded or printed.
    • It would be very helpful for small wikis if there was an automated process that would perform copyright searches and provided a Special:some_title report of potential violations. In addition, the Special: page would support some type of closure / hiding by users with appropriate rights who research the problem and find no copyright issues or have corrected or documented the issues.
    • Most of the specifics of addressing these problems are local issues. If the issues can't be addressed locally, they should be escalated following standard procedures.
    • Regarding Proposal 1: Oppose. There are three accepted ways to reference a source.
    • Regarding Proposal 2: Oppose. Referencing the source is sufficient. Public Domain content doesn't / shouldn't require more effort to reference than a copyrighted source.
    • Regarding Proposal 3: Oppose. This RFC appears to be in response to a local problem between admins that should be handled locally and escalated appropriately.
    • In response to the RFC itself, the only thing I would be able to support is automated tools that would reliably indicate copyright problems and provide some way to address them.
    • Dave Braunschweig (talk) 00:15, 1 December 2021 (UTC)[reply]
  • Comments by User:Lightbluerain
    • I have a suggestion: Why not have an En-Wikipedia's Cluebot like bot that tests the plagiarism of the edit made and revert when confirmed plagiarized. And, another bot to keep testing every page after every edit to keep testing the total percentage of plagiarism and tag the page for speedy deletion once the page's plagiarism percent rises above a specific percent? Lightbluerain (talk) 19:14, 8 December 2021 (UTC)[reply]
      There are fair questions: What percentage of coincidence would be considered plagiarism? Is quoting plagiarism too? How can automatic systems distinguish quotations from other text?
      This is not a question of opinion, but of the legal definitions enshrined in the copyright laws of various states.
      I asked to comment on this question and others in this RfC. And even this RfC with proposals to change the current license is a continuation of the above RfC with questions. Va (🖋️) 05:19, 13 December 2021 (UTC)[reply]
That percentage is to be decided. Perhaps 20% plagiarism is accepted but I am not sure. If quoting is to be checked, then I think the bots, rather than undoing the edits, can report the plagiarism at the appropriate place as we have the bots to report usernames on English Wikipedia. Lightbluerain (talk) 06:20, 15 December 2021 (UTC)[reply]
Obviously, only a court decision made in accordance with the law of a particular country can be "sure", and in all countries copyright laws are different. Moreover, the requirements are different for works of art and scientific or technical texts.
And how to check translated text for plagiarism? Va (🖋️) 16:51, 15 December 2021 (UTC)[reply]
Another problem: There are many mirrors and forks of Wikimedia projects, especially Wikipedia. How do we know if a website is a backwards copy or not. Chances are that there are at least double, likely more, the amount of mirrors and forks of the English Wikipedia alone than listed in their “Mirrors and forks” page. 71.239.86.150 22:12, 2 December 2023 (UTC)[reply]