Talk:Etherpad
Add topicResearch and etherpads
[edit]I'm designing a research participation framework based on etherpads with the wikiversity (french for now). As I consider the pads as "research material" describing the participation (its articulation, chronological observation), I intend to "store them". But to enable collaborative research, make these material freely available (under cc-by-sa) seems important.
For now, I simply made the pad content available [1] But I'd like to store the ".etherpad" export in order to keep colors, keyboard-chatting, and chronological evolution. For instance this pad 'time-framed' with the webinar live on youtube[2] stress out the contributors behaviors giving keys to re-design the next participation experiment.
Would wikimedia see interest in "storing" research material (such as etherpads) ?
Do any "wikimedian researchers" have on-going work in this field ?
BR
--RP87 (talk) 11:47, 8 November 2018 (UTC)
Admins?
[edit]@Quiddity (WMF): is there any documentation on the WMF administration of our etherpad instance? We have ticket that would require administrative action ticket:2019102110005428. Thank you, — xaosflux Talk 16:12, 6 January 2020 (UTC)
- I saw the technotes on wikitech, the primary item I'm looking for is: "How to intake a request the etherpad administrators". — xaosflux Talk 16:15, 6 January 2020 (UTC)
- this note suggests that WMF T&S may be in charge of this? — xaosflux Talk 18:41, 6 January 2020 (UTC)
- @Xaosflux: I've filed a task for this specific one, and added instructions to that section of the docs for next time. Cheers, Quiddity (WMF) (talk) 18:45, 6 January 2020 (UTC)
- @Quiddity (WMF): thank you, I copied your instruction to here as well. Would you let me know when the ticket is resolved so we can close the OTRS ticket please? — xaosflux Talk 18:54, 6 January 2020 (UTC)
- I CC'd you both after filing it :) Quiddity (WMF) (talk) 19:09, 6 January 2020 (UTC)
- @Quiddity (WMF): thank you, I copied your instruction to here as well. Would you let me know when the ticket is resolved so we can close the OTRS ticket please? — xaosflux Talk 18:54, 6 January 2020 (UTC)
- @Xaosflux: I've filed a task for this specific one, and added instructions to that section of the docs for next time. Cheers, Quiddity (WMF) (talk) 18:45, 6 January 2020 (UTC)
- this note suggests that WMF T&S may be in charge of this? — xaosflux Talk 18:41, 6 January 2020 (UTC)
Add copyright license to default starting text
[edit]The current default text of the Wikimedia Etherpad is
---
Welcome to the WMF etherpad installation. Please keep in mind all current as well as past content in any pad is public. Removing content from a pad does not mean it is deleted. Keep in mind as well that there is no guarantee that a pad's contents will always be available. A pad may be corrupted, deleted or similar. Please keep a copy of important data somewhere else as well
---
I propose that we add a statement of copyright license to this to facilitate the migration of text from Wikimedia Etherpad instances into Wikimedia projects. For remote Wikimedia meetups including these listed at WikiProject remote event participation, our meetings start by telling participants that whatever they post to the Etherpad will get moved to Meta-wiki and that anyone who posts to the Etherpad must consent to a compatible license. However, as we are scaling up use of Etherpad, we should be more thoughtful about appropriate copyright licensing.
I propose using the Wikidata license which is Creative Commons Zero or Public Domain. This is easiest to use. Etherpad has no good way of attributing authorship, so a CC zero license is easiest license with which we can comply.
Thoughts? Does anyone here have opinions about how I should proceed with this conversation? My first stop for soliciting comments will probably be WikiProject Remote Participation, then those linked projects I mentioned which have already been heavy users of Etherpad and who also have a long history of mentioning the copyright license at meetings, then the general wiki public. I suppose I should contact WMF lawyers immediately also, right? Thanks for any comments or suggestions from anyone here already minding the etherpad. Blue Rasberry (talk) 12:45, 17 April 2020 (UTC)
- This makes sense because it's hard to copy the names from the history and most people don't seem to care about attribution to them anyway (it's rare for users to set recognisable pseudonyms). Please no links to Wikidata, link the actual license/waiver URL like https://creativecommons.org/publicdomain/zero/1.0/ (which also has translations). Nemo 12:55, 24 April 2020 (UTC)
Proposed new text
[edit]- version 1
Welcome to the Wikimedia Etherpad. Users contributing text to do so under the same licensing terms as contributors to Wikidata. See the Wikidata copyright license at https://www.wikidata.org/wiki/Wikidata:Text_of_the_Creative_Commons_Public_Domain_Dedication
All current as well as past content in any pad is public. This document is not an archive and may automatically delete at any time. Please keep a copy of important data somewhere else as well, such as by transferring it to a Wikimedia project page.
Blue Rasberry (talk) 12:49, 24 April 2020 (UTC)
- I don't see any reason to mention Wikidata in the note. Also, Wikidata uses CC-BY-SA for text, and only uses CC-0 for structured data. --Yair rand (talk) 19:30, 26 April 2020 (UTC)
- Agree that (i) it is useful to clarify the licensing, (ii) that CC0 is the best choice, (iii) there is no need to mention Wikidata. -- Daniel Mietchen (talk) 01:18, 28 April 2020 (UTC)
- @Killiondude: Can you comment here? You removed the text where I described the copyright of etherpad content as "???". Can we adopt this text to clarify? Blue Rasberry (talk) 19:47, 28 April 2020 (UTC)
- I like the idea of including licensing information in etherpad, but I recommend a different approach. Some pads may contain content that is not CC0. If we want a copyright license (or waiver) to be effective, we may also need the assent of the authors. I recommend adjusting the default text to say this is up to the pad's author(s) to figure out, like: "Users contributing to this pad are encouraged to specify a free license. We recommend the CC0 dedication, available at https://creativecommons.org/publicdomain/zero/1.0/ ... " Stephen LaPorte (WMF) (talk) 01:04, 2 May 2020 (UTC)
- That doesn't help with the large majority of cases where users will just delete the placeholder text without putting anything in its place. Better state that there are default terms but exceptions are possible if another free license is specified (consistent with the general rule in the Terms of use). Nemo 08:00, 4 May 2020 (UTC)
- We'd want to include a persistent link to the default terms to be effective. I'm not sure if there's an easy way to add that in the etherpad interface, but I think that adjusting the default text is a workable interim solution for some cases if it has flexibility. Best, Stephen LaPorte (WMF) (talk) 01:26, 5 May 2020 (UTC)
- Thanks. Do you think it would be good next to "About Powered by Etherpad-lite" in the panel which opens when you click the cog? I don't see many other suitable options in the interface. Nemo 06:25, 5 May 2020 (UTC)
- Yes, that's a good spot to include it. Thanks, Stephen LaPorte (WMF) (talk) 23:32, 22 May 2020 (UTC)
- Thanks. Do you think it would be good next to "About Powered by Etherpad-lite" in the panel which opens when you click the cog? I don't see many other suitable options in the interface. Nemo 06:25, 5 May 2020 (UTC)
- We'd want to include a persistent link to the default terms to be effective. I'm not sure if there's an easy way to add that in the etherpad interface, but I think that adjusting the default text is a workable interim solution for some cases if it has flexibility. Best, Stephen LaPorte (WMF) (talk) 01:26, 5 May 2020 (UTC)
- That doesn't help with the large majority of cases where users will just delete the placeholder text without putting anything in its place. Better state that there are default terms but exceptions are possible if another free license is specified (consistent with the general rule in the Terms of use). Nemo 08:00, 4 May 2020 (UTC)
- I like this idea in nature, and agree that a Wikidata link is probably not the best idea. Stephen raises a good point about text occasionally needing a different license and I like Nemo's proposed solution that we simply add a disclaimer that exceptions are possible. Killiondude (talk) 19:44, 5 May 2020 (UTC)
Move Etherpad -> Wikimedia Etherpad
[edit]I propose to move
Etherpad -> Wikimedia Etherpad
Etherpad is the general software, but this documentation page is for the Wikimedia installation of this.
Any comments or objections? Blue Rasberry (talk) 12:47, 17 April 2020 (UTC)
- Phabricator is a local installation and is not located at Wikimedia Phabricator. I think simple pagenames are better. Killiondude (talk) 19:29, 21 April 2020 (UTC)
- Unless we're going to have an article about Etherpad in general as opposed to Etherpad in Wikimedia context, I don't see a need for disambiguation either. Nemo 12:55, 24 April 2020 (UTC)
Okay, I agree, no name change. Blue Rasberry (talk) 15:23, 24 April 2020 (UTC)
Terms of use?
[edit]Is this service meant only for use related to Wikimedia in some way? I get the impression that it is, but the inclusion of it here is confusing. Darylgolden (talk) 03:24, 19 May 2021 (UTC)
Etherpad wipe/deletion/clean slate proposal
[edit]TLDR: WMF Etherpad (https://etherpad.wikimedia.org/) is having ALL data removed, possibly as soon as Apr 30, 2026. If you've ever used this service, please take steps to retrieve and backup any data you wish to keep.
Etherpad is an "ephemeral", open source notetaking app (https://etherpad.org/). There is a community installation for the Wikimedia community, hosted by the foundation, at https://etherpad.wikimedia.org. It allows for stripped-down, "Google docs" style multi user note taking without any authentication or "protections" of content.
The WMF technical operations team has identified that the database holding the etherpad notebooks has grown extremely large and unwieldy (~233GB). To help manage this, they wish to delete all data and pads. The proposed date for cleanup, after which all pads will be inaccessible, is Apr 30 2026. From there, there is an open discussion on the cadence of future cleanups.
In the past, known usages of this service have been for collaboration as well as real time notetaking during conferences and meetups. It has also been identified that many of the pads that haven't been looked at in a long time might have been vandalized, contain spam, or been blanked. There is also a theory that a large number of the pads contain nothing but spam to begin with. Many Etherpad users report that they have already taken steps in the past to "back up" their information from Etherpad to more permanent homes on wikis and elsewhere.
When first opening Wikimedia Etherpad, users are presented with the following text:
Welcome to the WMF Etherpad installation. Please do NOT use this Etherpad for personal use. Your Etherpad may get deleted at any time without warning if its content is not related to Wikimedia. Please keep in mind all current as well as past content in any pad is public. Removing content from a pad does not mean it is deleted. Keep in mind as well that there is no guarantee that a pad's contents will always be available. A pad may be corrupted, deleted or similar. Please keep a copy of important data somewhere else as well.
Note that Wikimedia Code of Conduct (https://www.mediawiki.org/wiki/Code_of_Conduct) applies in this Etherpad instance as well
While this notice explicitly warns that data may not always be available, in practice, no purges of data have taken place in the past.
Due to the technical implementation of the Etherpad software, it is not possible to easily "delete pads older than N days" or even identify the last time a pad was accessed.
There is no known way of identifing pads that you may have contributed to in the past, other than your own records of them (there is no user authentication, so there is no record of "you" being "you"). Similarly, there is no known way of bulk exporting pads you may have created or contributed to.
However, individual pads can be exported. Simply visit one of the following URLs for your pad:
- http://etherpad.wikimedia.org/p/<pad_title>/export/txt
- http://etherpad.wikimedia.org/p/<pad_title>/export/html
eg: https://etherpad.wikimedia.org/p/export-test/export/html and https://etherpad.wikimedia.org/p/export-test/export/txt
(note: this "discussion" has been edited in place as additional versions of this notice have been developed, see the edit history)
To join in the more technical discussion see the Phabricator ticket. Audiodude (talk) 18:21, 10 February 2026 (UTC)
- Date clarification: The soft-deadline is now April 30, per this email (and as mentioned in your 3rd paragraph, and which I can confirm). Please could you update the date mentioned in your intro sentence above, to avoid confusion? Thanks! Quiddity (WMF) (talk) 17:44, 13 February 2026 (UTC)
- Good catch, thank you! Audiodude (talk) 19:13, 13 February 2026 (UTC)
- don't use the footgun... don't use the footgun... don't use the footgun...
- Please don't let links rot. This proposal will cause unnecessary, irrevocable destruction of knowledge and disruption for users. We have hundreds of editors, script writers, and partners who dedicate parts of their lives to ensure that links don't rot online, and that any links ever referenced by even a stub Wikipedia article are archived for posterity. There's no need to intentionally pull a large, actively used knowledge repository offline, and delete it irrevocably.
- As I understand it, the short reason is that even though etherpad is "causing no immediate issues", 230GB feels "extremely large and unwieldy", and there is a theory that many pads may be blank or contain spam.
- a) can you provide a full database dump for someone else who wants to host it? I've seen offers to maintain a backup, which seems extremely inexpensive. I will add my own offer to the mix. (someone mentioned a privacy concern, but doesn't the same banner that tells people content can be deleted, tell them that it is public? seems a strange reason to rush to destroy collective knowledge)
- b) It looks like you're going to delete ALL pads, including those created recently and in active use. can you instead start a new etherpad on a new domain, and on April 30 move the current instance to a new domain, and rename the new domain to etherpad.wikimedia.org?
- c) Does WMF have any sort of process or document for "exit to community" (for technical systems) when an important feature or function stops being maintained by the Foundation?
- Still processing, –SJ talk 02:02, 14 February 2026 (UTC)
- I wonder if we could scan for incoming links on the wikis or in Phabricator (e.g., https://etherpad.wikimedia.org/p/WikiDev16-WrapUp is linked in mw:Wikimedia Developer Summit/2016) and preserve at least a flat/most recent revision of those. That would probably be just a few thousand pages. WhatamIdoing (talk) 03:20, 14 February 2026 (UTC)
- That's already being done by Pppery. (thanks user:Pppery) But looking at my own contribs, a lot of them were at events where we didn't keep an inbound link on a mainline wiki, and others were workshop / hackathon / strategy notetaking that are linked to from email invites, videochats, google docs, &c. Others were linked to from other etherpads (e.g. branching out from one to many) –SJ talk 09:43, 14 February 2026 (UTC)
- Let's notify Archive-team. They will mercilessly scrape all data, with guessing titles through brute-force. Wargo (talk) 12:37, 14 February 2026 (UTC)
- I wonder if we could scan for incoming links on the wikis or in Phabricator (e.g., https://etherpad.wikimedia.org/p/WikiDev16-WrapUp is linked in mw:Wikimedia Developer Summit/2016) and preserve at least a flat/most recent revision of those. That would probably be just a few thousand pages. WhatamIdoing (talk) 03:20, 14 February 2026 (UTC)
Surely we don't need to contact Archive-team. Our community members are some percentage of archive-team. We just need to do the right thing re: preservation and despamming. –SJ talk 01:02, 24 February 2026 (UTC)
There is a theory that a large number of pads are just spam. Can't this be empirically tested by someone with database access? Choose a dozen or so pads at random and manually examine their contents and revision history. (It would also be worth checking what the largest pads are, if it's possible to do so. Maybe a relatively small number of huge spam pads are responsible for the issue and it would be possible to quickly prune them.)
Here is my own proposed solution to the problem:
(1) Develop a method for determining whether a pad is spam or not. Presumably Wikimedia already has an automated method of doing this. I'm often skeptical of AI language models, but this may be a case where they would be useful (might not even need a fancy state-of-the-art model).
(2) Generate a list of all pads and run the spam check against each one individually. I'm assuming that we only need to look at the content of the latest revision. If there is a problem with spammers taking over existing pads, then we may need to check both the initial and latest revisions.
(3) Delete the spam pads, keep the non-spam pads. This may have some false positives and false negatives, but it's better than deleting everything.