Jump to content

Community Wishlist Survey 2019/Admins and patrollers

From Meta, a Wikimedia project coordination wiki
Admins and patrollers
15 proposals, 392 contributors, 692 support votes
The survey has closed. Thanks for your participation :)

Allow partial reverts for edits

  • Problem: There are edits of which a part should be reverted but the rest should/can be kept. In such cases you don't always want to time-consumingly edit/improve the source of the page. Instead the changes are either completely kept or completely reverted.
    Deutsch: Es gibt Bearbeitunen, bei denen ein Teil rückgänig gemacht werden sollte, der Rest aber behalten werden kann. In solchen Fällen möchte man nicht immer den Quelltext der Seite aufwändig verbessern. Stattdessen werden dann die Bearbeitungen komplett behaltden oder komplett rückgängig gemacht.
  • Who would benefit: Everybody who only wants to revert parts of an edit
    Deutsch: Jeder, der nur Teile einer Bearbeitung rückgänig machen will
  • Proposed solution: Add a link "Partially revert" next to the normal "revert", which would open a page which looks like the "new" version of Two Column Edit Conflict View. It'll show a "conflict" between the version which will be partially reverted and the previous version. The conflict resolution can then be saved normally.
    Deutsch: Einen Link (Teilweise rückgängig machen) neben den normalen (Rückgängig) - Link setzen, mit dem eine Seite, die wie die "neue" Version des "Two Column Edit Conflict View" aussieht, aufgerufen wird. Es wird ein "Konflikt" zwichen der Version, die teilweise rückgängig wird und vorherigen Version angezeigt. Die Lösung des Konflikts kann dann normal gespeichert werden.
  • Other comments:
  • Phabricator tickets:
  • Proposer: FF-11 (talk) 17:24, 5 November 2018 (UTC) (Old username: Honischboy)[reply]



Page Curation and New Pages Feed improvements

  • Problem: New Page Review is a key process on Wikipedia, and the only firewall that prevents inappropriate new pages being added to the Encyclopedia. However, there are many longstanding issues with the Page Curation tools and the New Pages feed which inhibit efficiency and cause problems to be overlooked. Aside from a few additions made when the Growth Team added Articles for Creation (AfC) drafts to the New Page Feed last year, the tools haven't been supported for many years and the list of proposed developments is long. These include bugs, features never implemented, and suggested improvements which have been left unaddressed. While a few requests for improvement of the tools used by New Page Reviewers can be addressed by on-wiki customisation by volunteers, most others are part of the Mediawiki software and require the intervention of the WMF developers.

    We have been repeatedly informed that the Community Tech Team does not have the resources to provide ongoing support for the tools they developed (or even to fix bugs that have popped up) and that this wishlist is the only venue for us to come to for technical assistance. While some of the tasks below are almost trivially easy, others may require a significant time investment.

  • Who would benefit: While New Page Reviewers and admins are the only users with access to the Page Curation tools, ultimately the entire wiki benefits from the work that we do. AfC drafts were recently added to the New Page Feed, and many of the suggested improvements here would also benefit the AfC team. The English wiki is also not the only one that could benefit, as these tools are not only relegated to EN Wikipedia but were originally designed to be available for use on other wikis as well (There are currently some bugs stopping the rollout of the tools on other wikis, fixing these is part of this proposal as well; This would make the Page Curation tools and New Page Feed able to be enabled on other language wikis).
  • Proposed solution: The requested improvements come in a few categories:
    • Bugs that should have been fixed a long time ago,
    • Additional sorting options and additional 'potential issues' flagging in the PC tools and New Pages Feed, which would help to flag problems for reviewers to follow up on,
    • Resources dedicated to other suggested improvements supported by the New Page Reviewer community to make them more user-friendly and improve efficiency and to make them non-en.wiki-specific, enabling rollout on other language wikis.
  • More comments: While there were several dozen good suggested improvements, we have listed only the highest priority items below by their Phabricator tickets, in the hope that at least some of our most pressing concerns will be addressed.


Insertcleverphrasehere: Just a heads up, if you didn't notice -- we've moved this proposal from "Miscellaneous" to "Admins and patrollers". -- DannyH (WMF) (talk) 18:52, 15 November 2018 (UTC)[reply]

Possibly not at this moment in time, because it is still not perfect until the wished for upgrades are carried out. The English Wikipedia is however by far the most important project and being synonymous with the word Wikipedia is the source for all the undesirable pages masquerading as encyclopedia articles. As a MediaWiki extension, it could probably be ported to other WikiMedia projects (and even other non-Foundation projects who want to use it). It should be borne in mind however that without a well performing New Page vetting system, the English Wikipedia will cease to exist as we know it. Kudpung (talk) 06:16, 18 November 2018 (UTC)[reply]
Excuse me, I must ask you to say that again. What a nice and vivid statement concerning the status of English Wikipedia. "English Wikipedia is by far the most important project"? "English Wikipedia is the synonym for Wikipedia"? What's the proof? --Super Wang on zhwiki (Share your opinions) 12:04, 20 November 2018 (UTC)[reply]
Super Wang on zhwiki I think that Kudpung is referring to en.wikipedia being the hardest hit by spam and promotion pages. Therefore it is more necessary to have a robust system in place to deal with these additions on the English Wikipedia than anywhere else (elsewhere a lower volume can generally be dealt with by a small dedicated team, this is difficult on en.wiki). En.wiki *is* synonymous with the word Wikipedia, and it is the first project that companies and individuals come to when looking to advertise or promote themselves; specific countries have in-country promotional issues to deal with, en.wiki has global promotion to deal with. — Insertcleverphrasehere (or here) 13:38, 20 November 2018 (UTC)[reply]
@Insertcleverphrasehere:Thank you for explanation, but sorry, that doesn't make much sense to me. You know, due to Taiwan's int'l status and the large population using Chinese language, zhwiki has also been always a target of vandalism. I acknowledge that enwiki is the first project ever, but if you think enwiki is the only source and bulletin board for companies and celebrities, I must say that thought reflects "English language-priority". --Super Wang on zhwiki (Share your opinions) 00:17, 21 November 2018 (UTC)[reply]
Besides, enwiki has a "global" promotion doesn't mean other wikis don't. Chinese is the language used by the most people (1.5 billion, 2015 consensus) on this planet. There aren't less issues than those on enwiki here on zhwiki. It's some kind of bias that someone thinks "English Wikipedia has 'global' issues to deal with". --Super Wang on zhwiki (Share your opinions) 00:24, 21 November 2018 (UTC)[reply]
@Super Wang on zhwiki: Fair enough, and fair enough. As I've said in comments below, I'm keen to push to fix remaining issues that are keeping the Page curation tools from being non-en.wiki-specific, so that they can be used on other wikis as well; the phab task is phab:T50552. If we do this right, then the work done for these improvements can be ported to other wikis such as zh.wiki. Keen to chat more about this. — Insertcleverphrasehere (or here) 00:49, 21 November 2018 (UTC)[reply]
just a note I moved the discussion below to the discussion page. --Cohaf (talk) 00:51, 21 November 2018 (UTC)[reply]
You could listen a bit more carefully to your fellow intl Wikipedians. I also wondered why this is the top wish for „a key process on Wikipedia“ but it's about the US/GB Wikipedia only. Sargoth (talk) 10:09, 23 November 2018 (UTC)[reply]
@Sargoth, Once again, we were forced to come here as a last resort by community tech, which refused to maintain or update the tools that they built many years ago unless we came here and got in the top 10. Not sure how much more carefully you want me to listen; I've added the ticket to make the tools non-en.wiki-specific to the list above, and I'm going to push for this if I can so that these tools can be made available elsewhere on other language wikis as well. — Insertcleverphrasehere (or here) 13:29, 23 November 2018 (UTC)[reply]
„as a last resort“, this sounds disillusioned. I hope you get your supports (I didn't vote here and will propably not but am quite disapointed about the oppose-votes at the gender options proposal. Best of luck :) Sargoth (talk) 14:07, 23 November 2018 (UTC)[reply]
Sargoth Somewhat disillusioned, yes. As for the gender options one, note that oppose votes don't count. Still, the premise of a voting free-for-all and top ten as a good system for identifying all the stuff that needs developing is a bit ridiculous. It tends to lead to no development for proposals that are fairly trivial to fix, but only impact a relatively small segment of the user base. A good man once said that "Democracy is two wolves and a lamb voting on what to have for lunch". We had the same problem here, in that NPP isn't particularly sexy; this proposal is going to take a fair amount of effort, and while essential, it doesn't directly help all wikipedians (you can see some people complaining about it below). Note that you can canvass to your heart's content per the wishlist rules (which is really dumb, but it is how we managed to get a decent number of votes here). I'd suggest heading over to various wikiprojects and canvassing for votes if you want that particular proposal to pass in the top 10. — Insertcleverphrasehere (or here) 16:01, 23 November 2018 (UTC)[reply]
  • Oppose per Jo-Jo Eumerus, this is a Fascistism to IP users, things that is suitable for English are not necessarily suitable for other languages. -- 03:21, 26 November 2018 (UTC)[reply]
    Note that each language wiki would be able to use their own tags and deletion processes by specifiying what is in each section of the toolbar via on-wiki .js pages. As for 'Fascistism' to IP users; there is a simple solution to that problem. — Insertcleverphrasehere (or here) 08:58, 26 November 2018 (UTC)[reply]
  • Comment Comment I would really love to see a wiki-agnostic page curation tool. I wish there was a way to vote for that specifically - right now it's just one of a big bag of feature requests, a significant part of which I imagine would have to be dropped to make the scope reasonable. --Tgr (talk) 04:47, 25 November 2018 (UTC)[reply]
    Comment Comment @Tgr Most of the other feature requests are things that the other language wikis will want as well. In any case, from the looks of things making the tools non-en.wiki specific only consists of a few key things that need fixing, so not hard to tack on to this proposal; value for effort, it is one of the best tasks on this list. Making the tools Wiki-agnostic (non-en.wiki-specific) was tried as a separate request a few years ago in the community wishlist, but failed to garner enough support (40-somethingth); honestly getting tacked on to this proposal is the best chance it has to be completed. — Insertcleverphrasehere (or here) 09:04, 25 November 2018 (UTC)[reply]


Collapse multiple consecutive revisions by same author

  • Problem: Collapse consecutive edits by the same author in the revision history page
  • Who would benefit: The whole system, all Wiki users, especially moderator and peer reviewers
  • Proposed solution: Currently every update, be it even one single character, generates a new revision of the article. When the same author makes consecutive non-overlapping changes to the same page, the revisions could be collapsed into one single row. This will simplify the display and moderator review interface.
  • More comments: This will simplify moderator and peer review, and shorten the article history considerably.
  • Phabricator tickets:
  • Proposer: Geert Van Pamel (WMBE) (talk) 16:13, 4 November 2018 (UTC)[reply]


  • I am not aware of such functionality in VisualEditor, but it surely doesn't combine revisions on the backend. Disk space is not something you should ever worry about, but I get how the small, consecutive edits is problematic for patrollers. My bold stance here is this proposal is too technically involved for us, as it would presumably require a major reworking of how revisions are saved in MediaWiki. People smarter than would be able to judge this better, though. I will point out HotCat allows you to add/remove multiple categories at once, by clicking on the "++" link. MusikAnimal (WMF) (talk) 22:58, 4 November 2018 (UTC)[reply]
  • I have trouble imagining we'd do this on the backend (It would make the way we model revisions really complicated). But from the front-end perspective, is this basically asking for "enhanced recentchanges" but for history pages? BWolff (WMF) (talk) 02:14, 5 November 2018 (UTC)[reply]
    I would take that. --Izno (talk) 01:57, 6 November 2018 (UTC)[reply]
  • In Instiki, which is used on nLab, multiple consecutive edits by the same author within a minute do get merged into a single revision. This proposal would make editing in MediaWiki act like editing in Instiki. GeoffreyT2000 (talk) 16:29, 5 November 2018 (UTC)[reply]
  • Sometimes this would be undesirable, for example when a new page is created by splitting out material from another page, I understand the new material should be copied across and saved exactly as it was on the previous page, to allow attribution. If the editor then makes changes to tidy up the new page, these would be merged into the previous edit. Perhaps a way forward would be, if someone has made another edit within a set time of their previous one, such as 30 minutes, for a box to pop up asking if they want the 2 edits to be merged. Mmitchell10 (talk) 10:32, 10 November 2018 (UTC)[reply]
  • Geertivp Thanks for submitting a proposal. Per what MusikAnimal and BWolff said above, this proposal as it stands is not workable as it asks for a massive change in how MediaWiki handles revisions. Disk space is certainly not something you should be concerned about. On the contrary, doing this change will break a huge number of MediaWiki extensions, gadgets and community-built tools. However, if you are asking for better history pages (collapse multiple consecutive revisions by same author), then we can probably try to see if we can build that. Would you like to rename and redefine the proposal to that effect? -- NKohli (WMF) (talk) 04:57, 14 November 2018 (UTC)[reply]
    Yes, please, rename to "Collapse multiple consecutive revisions by same author". Geert Van Pamel (WMBE) (talk) 10:25, 14 November 2018 (UTC)[reply]
@Geertivp: Thank you. I've renamed the proposal and revised the proposal a bit. I hope that's okay. Please do ping me if you don't agree with any of the changes. Thanks for participating in the survey. -- NKohli (WMF) (talk) 17:33, 14 November 2018 (UTC)[reply]
  • There are times when an editor makes a series of very different edits, and one might want to undo one but not the rest (as in the above request for "partial revert"). Combining all the edits would make it more cumbersome. PamD (talk) 23:52, 16 November 2018 (UTC)[reply]
  • This kind of automation is not good. There are two cases: editors, who deliberately split their changes into logically different sets, and editors, who make those smaller edits for other reasons. I'd compared this with how git works. It's possible to rebase a branch, and pick or squash commits. Maybe, something similar can apply here: After making a series of smaller commits, the author can decide to make them into one "commit" or fewer number of "commits". РоманСузи (talk) 17:39, 17 November 2018 (UTC)[reply]
  • Also squash reverted edits. Note that it should be possible to inspect squashed edits. — Jeblad 08:45, 18 November 2018 (UTC)[reply]


Prevent DDoS-style attacks

  • Problem: Currently at en.wikipedia, there is a troll that has been running a DDOS-style attack on the various pages that unregistered users frequently go for help (Ref Desks, Teahouse, Help Desk, and related talk pages). Currently the only blunt object we have to stop them is semi-protecting those pages, which is making it VERY difficult for IP-based users (or newly registered users who are not autoconfirmed) to find places to ask for help. The attack consists of spamming vile attacks against registered Wikipedia users, or grotesquely racist comments, or other attacks, hammering the page with edits as fast as every second or two, and jumping from open proxy to open proxy to evade blocking. The attacks start within minutes of the old protection expiring, and continue unabated until protection is returned. It is making the Wikipedia help system essentially unusable.
  • Who would benefit: Admins would have a finer-grade tool to protect pages, by allowing good-faith users to edit while stopping this highly disruptive sort of attack. IP and newly registered users would not be caught up in sanctions not intended for them.
  • Proposed solution: If there was a way to throttle edits to a page in some way, so that the same IP address and/or account would be prevented from multiple, repeat edits it could slow down this virulent attack. They would still be able to jump from open proxy to open proxy, but that takes them a little longer, and it would reduce the speed of the attack to something that hopefully we could manage with blocks and RevDel rather than page protection. What I am looking for here is a way to throttle protection so that the same IP address would have to wait for a non-bot intervening edit before being able to edit again. It wouldn't be needed often, but would come in VERY handy when it is needed.
  • More comments:
  • Phabricator tickets:


  • Does {{helpme}} not help with this sort of thing? New contributors could use it quite well until they become autoconfirmed or whatever. Gryllida 22:15, 30 October 2018 (UTC)[reply]
  • Throttle edits to a page in some way, so that the same IP address and/or account would be prevented from multiple, repeat edits This can be accomplished with mw:AbuseFilter. It does not allow you to wait for a non-bot intervening edit before being able to edit again, but I'm confused why that would help. Could you elaborate? MusikAnimal (WMF) (talk) 22:31, 30 October 2018 (UTC)[reply]
    The problem is that someone has found a way to attack Wikipedia and shut down any page they wish using a DDOS-style attack. Currently, they are restricting themselves to areas where IP editors seek help (Help Desk, Teahouse, Ref Desk, ANI occasionally). They also tend to attack the user talk pages of the admins who try to stop them, or other users at will. If you have oversighter or admin privileges, you can pull up the history of this and see a sample of the problem here. If anyone has any idea on a technical fix that would stop this, it would be great. Right now, anyone who does this is able to shut down any page at will, and we really have no defense to it. It's exposed a vulnerability to the entire software, and it would only take one person with the will to do it to, say, expand their scope from a predictable set of project-space pages to instead just start spamming random articles across Wikipedia for hours on end, and they would soak up huge amounts of resources chasing them around, and really, we have no tools to stop them as yet. I'm not really tied to any solution, I made the proposal to get the ball rolling on a discussion, but in the end, it's stopping the problem and not this solution I think we should care about. Maybe I'm over reacting, but I would not want to see this go worst-case scenario, and have the Foundation get caught with their metaphorical pants down. We've got this contained to 1 troll now who is doing it in a limited fashion. This is the sort of thing that, if a bunch of people got bored or had an axe to grind at Wikipedia, could do a lot of damage. If y'all don't see this as a problem, that's fine. It's been annoying as heck at en.wikipedia to deal with on a daily basis, and I don't want to see this trend growing. If anyone has other ideas on technical fixes for this, beyond "end IP-based editing", I'd love to see them. --Jayron32 (talk) 11:50, 31 October 2018 (UTC)[reply]
    @Jayron32: w:Special:AbuseFilter/796? We should not discuss this filter publicly, but it seems to do what you are suggesting. MusikAnimal (WMF) (talk) 16:34, 3 November 2018 (UTC)[reply]
    There are currently 3 abuse filters set to stop this one person. They don't really work, because abuse filters are only set to recognize specific text strings. He's been doing this for years, and knows enough to vary his text strings just enough to make the abuse filters useless. --Jayron32 (talk) 15:06, 5 November 2018 (UTC)[reply]
Maybe adding to abusefilter an option to trigger a captcha, might be helpful? Assuming that attack is bot-based? BWolff (WMF) (talk) 01:51, 5 November 2018 (UTC)[reply]
It may or may not be bot based. I suspect he may be just rapidly doing the edits manually, but it COULD be bot-based as well. Having a CAPTCHA option sounds brilliant, actually. If admins had the ability to set a "CAPTCHA protection" for certain pages, that may slow him down enough for us to minimize damage. It's a small inconvenience for making one edit, but it would slow down the rapid, repeated attacks so we could deal with them. --Jayron32 (talk) 15:06, 5 November 2018 (UTC)[reply]
I would be very interested to see if CAPTCHA protection would solve this problem. I highly doubt this is done manually. They come in more quickly than I can even hit the rollback button, and I have a hard time imagining something that would take less manual time than a single click. GMGtalk 13:25, 7 November 2018 (UTC)[reply]
Adding something like $wgCaptchaTriggersOnProtection['captchaProtection']['edit'] = true; to Extension:ConfirmEdit might be an option. I would actually be interested in the effect of enabling CAPTCHA on pending-changes protected pages more generally. TheDragonFire (talk) 13:36, 7 November 2018 (UTC)[reply]

Even though a Captcha might be an interesting solution, I don't see how a reasonable rate-limit would do anything but slow down those attacks considerably, while being far less invasive for human editors. It would make sense to base this option not only on IPs, which, as already pointed out, could be rotated as a countermeasure, but also on the attacked pagese themselves. As this option does not seem to exist, from what I read here, that might be a worthwhile extension to prevent a predictable serious threat. --Eloquenzministerium (talk) 14:38, 8 November 2018 (UTC)[reply]

  • Alright, so per this comment looks like the edits are definitely automated, which makes me inclined to think a captcha would be the way to go. Also it would be super if we could escalate this in terms of importance given the level of disruption, rather than winding through the whole community wishlist process so we can get something implemented ~sometimes over the next year or so. GMGtalk 14:59, 9 November 2018 (UTC)[reply]
  • Just an update on the need for this; our friend has decided to expand his reach to any random article/talk page/etc. See [1]. Having a solution to stop him beyond "semiprotecting the entire encyclopedia" would be great. --Jayron32 (talk) 15:56, 9 November 2018 (UTC)[reply]
    @Jayron32: Are we asking to impose a CAPTCHA via AbuseFilter? Or is the proposal still centered around a new form of page protection that would throttle edits from the specified user groups? Maybe impose a CAPTCHA once said throttle has been hit? The voting phase goes live this Friday, and I want to make sure we know what we're voting on. Obviously we want to put a stop to the current vandalism spree, but it's worth noting the wishlist defines projects we'll take on in the next calendar year, and not so much for urgent solutions. That said whatever comes out of this will be useful for future vandalism sprees, I'm sure. MusikAnimal (WMF) (talk) 04:32, 14 November 2018 (UTC)[reply]
    On second thought, I think the problem statement is clear. We don't need to worry about the solution just yet. However we might consider is renaming the proposal to something like "Prevent DDoS-style attacks". This will make it more clear what we're voting on, and might increase the chances of getting a lot of support. If I don't hear back about this soon, I'll take the liberty of renaming the proposal. If you later disagree, just ping me :) Ideally we'll have this sorted out before voting starts tomorrow. MusikAnimal (WMF) (talk) 19:29, 15 November 2018 (UTC)[reply]
    I have performed the rename. Hope this is okay! MusikAnimal (WMF) (talk) 16:16, 16 November 2018 (UTC)[reply]
  • Throttling similar edits over several pages are possible. I wrote about it a few years back, but nobody has picked up the idea. Could be because it involves math. — Jeblad 08:50, 18 November 2018 (UTC)[reply]

Shouldn't this proposal be in anti-harrassment category? --Dvorapa (talk) 16:59, 18 November 2018 (UTC)[reply]


Create an integrated anti-spam/vandalism tool

  • Problem: Our current infrastructure for countering vandalism and spam at the cross-wiki level is stuck in 2004. We have a spamblacklist with poor logging done on a per-wiki basis, a global abusefilter which isn't global, and a title blacklist that doesn't log at all.
  • Who would benefit: Stewards
  • Proposed solution: Create an integrated anti-spam/vandalism tool (like Phalanx used on Wikia) that combines the functions of the spam and title blacklists, as well as limited abusefilter functionality, to better respond to ongoing spam and vandalism at the global/cross-wiki level.
  • Phabricator tickets:


Hi Ajraddatz. I wonder if some of the needs of this proposal have been met by the recent improvements made to Recent Changes and Watchlist feeds? -- NKohli (WMF) (talk) 18:37, 30 October 2018 (UTC)[reply]

Unfortunately not - what I'm thinking of is cross-wiki in scope and blocks edits before they happen, rather than reacting to edits that have gone through. – Ajraddatz (talk) 18:40, 30 October 2018 (UTC)[reply]
Hi! We discussed this proposal in our team meeting today. We are not sure how much we will be able to do but we'll try to do our best. It will probably not be a big cross-wiki thing though. Doing cross-wiki projects is difficult with MediaWiki's current architecture. We will scope this project and come up with what we can do if it is in the top 10. Thanks. -- NKohli (WMF) (talk) 17:50, 13 November 2018 (UTC)[reply]
Thanks! You're in luck, because I doubt this proposal will be in the top 10. Because of the limitations in using the current extension, only a small handful of people even work with it, and this topic isn't glamorous enough to gather attention from beyond the handful of people would would be impacted by a change. That said, it's still something that is important to have eventually, so I hope this puts it on the map. – Ajraddatz (talk) 18:40, 13 November 2018 (UTC)[reply]

I agree anti-abuse tools could use more love. This proposal, on the other hand, could use more details. Are you specifically worried about logging? The difficulties local communities have in interacting with global tools? Are there specific features of Phalanx that you miss? What's wrong with global abuse filters? --Tgr (talk) 04:15, 25 November 2018 (UTC)[reply]

Fair point. I'll give a walk-through of the current state and point out the big problems that could be fixed. I'll also preface this by saying that integrating the elements into one tool is more for convenience; the problems are with each specific tool, and could be improved separately instead.
Spam blacklist: I notice a link being spammed by a couple of bots, so I want to add it to the spam blacklist. My workflow involves opening the page, waiting a few seconds for it to load, painfully scrolling down the list and trying to find a good place to add the regex. The page is so large that it lags going through it. Once I find the right place, I need to figure out what the correct regex is - I can speak regex-1, so simple additions are no problem, but I need to ask someone else to add anything more complex. After adding the text and waiting another 10-20 seconds for the page to save, I then need to manually add an entry on the spam blacklist log, since there is no automatic logging of additions. This takes another 10-20 seconds to get the diff number and justify the addition. Once I've made the addition, I have no ability to follow-up and see what it is blocking because the logging is done on a per-wiki basis and I don't have the time to check all 700 wikis. Total time: 1-2 minutes, when the rest of my anti-abuse workflow takes 10-20 seconds total. Big areas to improve: 1. change it from a big page to an extension that allows each entry to be handled individually. Imagine account blocking if you had to add the name of the account to a big page with thousands of other names. 2. automatic logging. 3. some system where you can see the impact of the action you just took - subsequent attempted uses of the blocked link.
Title blacklist: I notice an account name has been abused across multiple wikis, so I want to block the name. My workflow is a bit easier because the title blacklist is smaller than the spam blacklist, so it only lags a little bit. I add the appropriate regex to the appropriate section and create a log entry. Total time: 1 minute, still much slower than the rest of my workflow. There is no per-wiki or cross-wiki logging to see what impact my entry has had. Areas to improve: same as spam blacklist, individual entry handling, automatic logging, a way of auditing the actions blocked by the addition.
Global AbuseFilter: some cross-wiki vandal is doing a specific type of vandalism across multiple wikis. I create a filter to prevent such actions (this is already pretty complex, and could use some serious simplification for the less technically-minded among us), but it only applies to the small wikis that the global abusefilter is enabled on. The vandal continues to hit large wikis, forcing me to either contact local admins to get them to duplicate the global filter locally or (and this is what I usually end up doing) ignore the problem because I don't have half an hour to follow up on this. I also don't necessarily need a whole abusefilter: a simple condition that could be done through Phalanx or SpamRegex (old extension) would have sufficed. Areas to improve: make the global abusefilter global, add a lower tier of abuse prevention through the integrated tool.
And of course this is just a start of the laundry list of unaddressed problems with global anti-abuse tools. Global accounts cannot be blocked, requiring me to checkuser almost every account I lock so I can block the underlying IP as well. Abuse from developing countries tends to be on mobile ranges or from other public IP ranges, so there are often situations where I cannot place any IP blocks due to the potential collateral damage - the problem here being no other options to block people other than using IPs and account names. Many stewards also just block anyway, leading to the literal hundreds of unaddressed requests for unblock in our email queue from people caught in massive global rangeblocks. But this area seems like one where existing extensions (spamregex, phalanx) could be used as a starting point to make some easy fixes. – Ajraddatz (talk) 23:24, 25 November 2018 (UTC)[reply]


Make undeletion page ID sensitive

  • Problem: The practice of selecting revisions on Special:Undelete to be restored is an outdated practice that is problematic nowadays for various reasons.

    The first thing one might have noticed in MediaWiki 1.31 is that parent IDs are now preserved on undeletions. But this still has some problems. The first problem, and the most serious one, is that it can cause page histories to be very confusing. For example, when viewing the history of Draft:Marques Monroe on Wikipedia, you will see that many size differences do not look "right". Allowing rev_parent_id to point to a revision in another page's history would just make things worse (in particular, if a revision from page A had rev_parent_id pointing to a revision from page B and B is deleted, the parent ID would become broken). Another problem is that if one selects a revision without selecting the previous revision, undeletion might result in a revision with a broken parent ID (T193211). For example, on this wiki, revision 18325584 has rev_parent_id 18325581, which is a deleted revision ID. Yet another problem is that there are already many revisions, e.g. on Wikipedia, where rev_parent_id either incorrectly points to a later revision (T38976; particularly imported revisions, such as the oldest 7 revisions in the history of the "2002" article) or is incorrectly set to zero (e.g. the revisions from June 2005 in the history of the "2006 in video gaming" article), and forcing parent IDs to be preserved on undeletions would make it impossible to solve such problems. Finally, one more problem is that for revisions deleted in MediaWiki 1.18 or earlier, which did not have the ar_parent_id field filled in, the pre-1.31 behavior of using the "getPreviousRevisionId" function would still be followed, which is inconsistent with the behavior for revisions deleted in newer MediaWiki versions.

    Setting the parent ID issues aside, there are still a few more issues with the current undeletion schema. One problem is that Special:Undelete lists all deleted revisions for a given title without indicating which deleted page ID they originally belonged to. It also allows two unrelated page histories to be "mixed" or "merged" too easily, which would cause confusion regardless of what happens with the parent IDs. Another problem is that timestamps, not revision IDs, are used to identify deleted revisions, meaning that two revisions with the same timestamp could not be separated (T39465). Finally, one more problem is that for pages with thousands of deleted revisions, Special:Undelete would list too many revisions all on one page.

  • Who would benefit: Administrators, particularly those who do history merges or imports
  • Proposed solution: See the tasks below for more details. We would then have a "pagearchive" table with columns named "pa_id", "pa_namespace", "pa_title", "pa_page_id", and "pa_rev_count", and the archive table would have a new column named "ar_pa_id". Also, a script that migrates the "ar_namespace", "ar_title", and "ar_page_id" columns to the new table would then be created. Then another script would be created that would de-duplicate page IDs in the "pagearchive" table, and also de-duplicate those that duplicate the ID of existing pages or an existing rev_page field. Finally, one more script would be created that would populate missing pa_page_id fields that previously corresponded to missing ar_page_id fields.

    Special:Undelete would not list deleted revisions anymore. Instead, it would list deleted page IDs from the "pagearchive" table as radio buttons, and deleted revisions, their diffs, and deleted page histories could be viewed directly by using the "oldid" and "curid" parameters in the URL. It would also allow the page to be automatically moved to another title of the user's choice after restoration, and this would be mandatory when the title of the page one is trying to restore already exists. In the latter case, the existing page would be temporarily deleted to allow for the restoration of the selected page ID. Also, Special:DeletedContributions would be fixed to look more like Special:Contributions, by displaying size differences and, for deleted revisions where the ar_parent_id is zero, the "new page" mark.

    Importing revisions to an existing page title would be allowed only when they are either all later than the page's latest revision, all earlier than the page's oldest revision, or all fit between two consecutive revisions in the page's history. In the latter two cases, the rev_parent_id field for the first revision following the imported revisions would automatically be changed to the ID of the latest imported revision.

    Special:MergeHistory would still exist, but would also have another restriction, namely that it could not be used when all of the revisions from the source page's history would become part of the target page's history. Also, the first revisions following the merged revisions in both pages' histories would automatically have their parent IDs swapped, so that n and 0 would become 0 and n.

    Two new special pages named "Special:SplitHistory" and "Special:MergeAndMove" would be created. For the former, a "SplitHistory" class would be created that would contain a function that splits the history of a deleted page ID by changing the ar_parent_id field for a single deleted revision to zero and assigning a new page ID to that deleted revision and all later ones. For an existing page, "Special:SplitHistory" would ask you to choose a cut-off point, which title some revisions should be moved to, and whether the earlier or the later revisions should be moved there. For the latter, a "MergeAndMove" class would be created that would contain a function that first changes the rev_parent_id field for the oldest revision in the target page's (henceforth called "B") history to coincide with the page_latest field for the source page (henceforth called "A"), then changes the rev_page field for all of B's revisions to A's page ID, deletes B from the page table, and finally moves A to B to complete a history merge.

    The "populateParentId" script would be modified so that it could not only populate rev_parent_id fields, but also ar_parent_id fields. And it would also generate a new parent ID for all existing and deleted revisions that already had one to fix many parent ID problems.

    Finally, a clean-up script would be created that would permanently delete some revisions from the revision and archive tables. When duplicated revisions (those with the same rev_page or ar_pa_id, same timestamp, same comment, same minority status, and same SHA1) occur, the script would keep the smallest rev_id or ar_rev_id only, and would automatically remove the RevisionDelete status from that revision if the text, edit summary, and username or IP address are all hidden. An example of where the latter would occur is the history of the "Cecilia Skingsley" article on Wikipedia. For revisions by IP addresses, the script would also delete the revisions from the "ip_changes" table.


I note that T20493 may be relevant, in that it contemplates changing how deletion works in a much less overcomplicated manner. Anomie (talk) 12:07, 1 November 2018 (UTC)[reply]

Hi GeoffreyT2000, I left a message for you and MER-C on Community Wishlist Survey 2019/Admins and stewards/Feature parity for tools dealing with deleted revisions. -- DannyH (WMF) (talk) 01:29, 14 November 2018 (UTC)[reply]


Overhaul spam-blacklist

  • Problem: The current blacklist system is archaic; it does not allow for levels of blacklisting, is confusing to editors. Main problems include that the spam blacklist is indiscriminate of namespace, userstatus, material linked to, etc. The blacklist is a crude, black-and-white choice, allowing additions by only non-autoconfirmed editors, or only by admins is not possible, nor is it possible to allow links in certain namespaces. Also giving warnings is not possible (on en.wikipedia, we implemented XLinkBot, who reverts and warns - giving a warning to IPs and 'new' editors that a certain link is in violation of policies/guidelines would be a less bitey solution).
  • Who would benefit: Editors on all Wikipedia's
  • Proposed solution: Basically, replace the current mw:Extension:SpamBlacklist with a new extension with an interface similar to mw:Extension:AbuseFilter, where instead of conditions, the field contains a set of regexes that are interpreted like the current spam-blacklists, providing options (similar to the current AbuseFilter) to decide what happens when an added external link matches the regexes in the field (see more elaborate explanation in collapsed box).

    Note: technically, the current AbuseFilter is capable of doing what would be needed, except that in this form it is extremely heavyweight to use for the number of regexes that is on the blacklists, or one would need to write a large number of rather complex AbuseFilters. The suggested filter is basically a specialised form of the AbuseFilter that only matches regexes against added external links.

description of suggested implementation

description of suggested implementation

  1. Take the current AbuseFilter, create a copy of the whole extension, name it ExternalLinkFilter, take out all the code that interprets the rules ('conditions').
  2. Make 2 fields in replacement for the 'conditions' field:
    • one text field for regexes that block added external links (the blacklist). Can contain many rules (one on each line, like current spam-blacklist).
    • one text field for regexes that override the block (whitelist overriding this blacklist field; that is generally more simple, and cleaner than writing a complex regex, not everybody is a specialist on regexes).
  3. Leave all the other options:
    • Discussion field for evidence (or better, a talk-page like function)
    • Enabled/disabled/deleted (not turn it off when not needed anymore, delete when obsolete)
    • 'Flag the edit in the edit filter log' - maybe nice to be able to turn it off, to get rid of the real rubbish that doesn't need to be logged
    • Rate limiting - catch editors that start spamming an otherwise reasonably good link
    • Warn - could be a replacement for en:User:XLinkBot
    • Prevent the action - as is the current blacklist/whitelist function
    • Revoke autoconfirmed - make sure that spammers are caught and checked
    • Tagging - for certain rules to be checked by RC patrollers.
    • I would consider to add a button to auto-block editors on certain typical spambot-domains (a function currently taken by one of Anomie's bots on en.wikipedia).

This should overall be much more lightweight than the current AbuseFilter (all it does is regex-testing as the spam-blacklist does, only it has to cycle through maybe thousands of AbuseFilters). One could consider to expand it to have rules blocked or enabled on only certain pages (for heavily abused links that actually should only be used on it's own subject page). Another consideration would be to have a 'custom reply' field, pointing the editor that gets blocked by the filter as to why it was blocked.

Possible expanded features (though highly desired)
  1. create a separate userright akin AbuseFilterEditor for being able to edit spam filters (on en.wikipedia, most admins do not touch (or do not dare to touch) the blacklist, while there are non-admin editors who often help on the blacklist).
  2. Add namespace choice (checkboxes like in search; so one can choose not to blacklist something in one particular namespace, with addition of an 'all', a 'content-namespace only' and 'talk-namespace only'.
    • some links are fine in discussions but should not be used in mainspace, others are a total nono
    • some image links are fine in the file-namespace to tell where it came from, but not needed in mainspace (e.g. flickr is currently on revertlist on en.wikipedia's XLinkBot)
  3. Add user status choice (checkboxes for the different roles, or like the page-protection levels)
    • disallow IPs and new users to use a certain link (e.g. to stop spammers from creating socks, while leaving it free to most users).
    • warn IPs and new users when they use a certain link that the link often does not meet inclusion standards (e.g. twitter feeds are often discouraged as external links when other official sites of the subject exists; like the functionality of en:User:XLinkBot).
  4. block or whitelist links matching regexes on specific pages (disallow linking throughout except for on the subject page) - coding akin the title blacklist
  5. block links matching regexes when added by specific user/IP/IP-range (disallow specific users to use a domain) - coding akin the title blacklist

We would lose a single full list of material that is blacklisted (the current blacklist is known to work as a deterrent against spamming). That list could however be independently created based on the current rules (e.g. by bot).

  • More comments:
  • Phabricator tickets: task T6459 (where I proposed this earlier)


I 2nd this and would like to request a feature to be added: When fighting spam one always has to look up both SBL log and Abuse log. It would have saved me thousands of clicks if the SBL log could be merged (for displaying purposes only) into the Abuse log (by checkbox for example or even via a virtual abuse filter number) so that one view shows the spamming actions. --Achim (talk) 12:50, 30 October 2018 (UTC)[reply]

Just to set realistic expectations here, it is unlikely that the Community Tech team would have the time and resources to create something similar to AbuseFilter. It may be possible, however, for the team to make some improvements to the existing bare-bones implementation. Ryan Kaldari (WMF) (talk) 01:05, 14 November 2018 (UTC)[reply]

    • @Ryan Kaldari (WMF): my apologies if this comes through harsh: that is a matter of priorities, the WMF has spent an enormous amount of developing time to code that is sometimes flat out rejected by the community, code that is then stuffed down our throats in the worst cases (up to superprotect, remember). The spam blacklist is now a crude, minimalistic piece of code that, despite some hacks, is in some cases more disruptive than the spammers/abuse, and maintainers need sometimes to jump through hoops to collect evidence. Requests to upgrades exist for years now, utterly ignored. Three bots are written to fill in some of the holes, but it would be more efficient to replace on of them (which is now only protecting one wiki ..), and have a meaningful and useful extension (and replacement to a part) for the others.
    Editor retention and attracting new editors scales linearly (if not exponentially) with attracting spammers and ‘keeping’ (i.e. not getting rid of) them ... it is time to finally realise that the admin corps needs more tools to protect. —Dirk Beetstra T C (en: U, T) 03:58, 14 November 2018 (UTC)[reply]
    • @Ryan Kaldari (WMF): (<- new ping, as I am not sure if my previous ping worked). One of those hoops is currently discussed here: Talk:Spam_blacklist#more_than_3000_entries. One of our maintainers, User:Lustiger seth is now running a very long running script to collect all rules that have not been hit. From that, we have to go through other hoops to select out the ones that should, even if they do not hit, should not be removed, and then we can cleanup the current list of several thousands of entries. Editors in some cases request de-listing as 'it has not been added' (dôh), and we have to try and show that it was actually tried (which we ignore because we simply can't). The current blacklist is too crude. It needs to be replaced by something that is way more modular and has way more options. It cannot be done by the current extension, it basically needs to be re-designed from scratch. --Dirk Beetstra T C (en: U, T) 05:35, 14 November 2018 (UTC)[reply]
      • @Beetstra: Thanks for providing more context. It sounds like the current system is pretty abysmal. I think it would be appropriate for us to consider an overhaul, I just don't think it would be realistic for our team to build something as complex as AbuseFilter within the constraints that we have (and considering our very limited capacity for maintenance work afterwards). I won't disagree with any of your arguments, just want to set realistic expectations. Ryan Kaldari (WMF) (talk) 17:43, 14 November 2018 (UTC)[reply]
        • How about others in the WMF? Although that would be out of the scope of the survey. C933103 (talk) 06:14, 15 November 2018 (UTC)[reply]
        • @Ryan Kaldari (WMF): in fact, I am taking the already existing AbuseFilter, and rip out all the rule interpretation code and replace it with one (maby two) simple regex check. It is byfarnot as complex as the AbuseFilter, and the rest of the existing code is almost immediately usable without adaptation. —Dirk Beetstra T C (en: U, T) 07:10, 15 November 2018 (UTC)[reply]
          • @Beetstra: Unfortunately, the AbuseFilter extension has been mostly unmaintained for years and would need to be overhauled, even if we didn't reuse the rule interpretation code. The AbuseFilter codebase is extremely old and buggy at this point. Last time I checked it had 311 open bugs, including 23 open security bugs! The new MediaWiki Platform team has agreed to do some work on cleaning-up AbuseFilter next year, but in the meantime I don't think it would be a viable option for us. I'm confident we could come up with something that works better than the current system, however. Ryan Kaldari (WMF) (talk) 21:58, 15 November 2018 (UTC)[reply]
            • @Ryan Kaldari (WMF): so .. a nice time for a joint effort? Or are we back at my initial point regarding WMF (IMHO completely misplaced) priorities? Not only have they made the spam blacklist become outdated, also the AbuseFilter .. <sarcasm>at least now we have a nice image viewer, and a visual editor and what not ... </sarcasm> —Dirk Beetstra T C (en: U, T) 05:24, 16 November 2018 (UTC)[reply]
  • I am going to +1 this, the blacklist and edit filters are a hugely undervalued part of protecting the project against abuse and it really is long past time we put some effort into improving them. JzG (talk) 00:28, 16 November 2018 (UTC)[reply]
  • This has a hidden assumption. If the spam blacklist is viewed as a pure spamlist, then it is a one-level list. If it is used to block questionable sites, then it is a multilevel list. Whether to give the user a warning is optional to what the list should be. A multilevel list for questionable sites is a pretty large discussion, as it must also discuss what kind of external vetting should be done. — Jeblad 08:42, 18 November 2018 (UTC)[reply]
  • @Man77: I am not sure if I understand what you mean. ‘Complex regexes’ are for filtering the added external links. That wikicode is too complex is a different story, independent of the mechanisms that make Wikipedia work. Is that what you mean? —Dirk Beetstra T C (en: U, T) 03:45, 21 November 2018 (UTC)[reply]
  • @Ryan Kaldari (WMF) and Beetstra: a simple hack would be to add an AbuseFilter function that takes a page name, loads the page, parses it into a regexp, and returns it (or matches it against the second argument). It would require some caching but even so it's fairly simple, does not require any structural changes to AF code, and would allow spam blacklists to be converted into filters, with all the UI and logic improvements that entails. --Tgr (talk) 05:01, 25 November 2018 (UTC)[reply]
    • @Tgr: True, that works, and that is what we sometimes do. But with thousands of blacklist rules that entails an enormous number of filters (including global ones). The AbuseFilter is in itself a slow system, where the interpretation (especially with complex regexes) will be the problem, which will only slow down if we use logic to select the rules (which, IMHO, could even be moved out of the AbuseFilter itself, into 'options' to significantly improve the speed - though then the options become so many that it defies the system). --Dirk Beetstra T C (en: U, T) 05:22, 25 November 2018 (UTC)[reply]
      • @Beetstra: Wouldy you really need many filters though? You could have one for non-admin, one for non-autoconfirmed, a few for specific namespaces maybe... there are only so many useful combinations of conditions. And since the regexes would be on a separate page and not in the AF rule, and parsed the same way they are now, AF parsing speed would not be an issue (as long as this does not significantly inflate the number of filters). --Tgr (talk) 05:45, 25 November 2018 (UTC)[reply]
        • @Tgr: It would be a significant number of filters (user-level (new editor/IP; non-admin only ..), namespace, page specific (stop adding youtube and twitter feeds to <subject>), full block/warning only; with probably some cross-combinations), with huge blocks of regexes (and the regexes need to be split, as there is a maximum size to one regex, you'd need the same blackist regex-parser for it).
    One of the other strengths of this adapted system (as suggested above, not grouped as you suggest), IMHO, is that we could tell for some why it is blocked and what to do (shorteners -> use the expanded rule; majorly copyright violating sites: link to the original; petition sites -> find either secondary sources and use those), and that we could more easy do per-page exceptions (you cannot link to this site anywhere, except on the subject-page). Moreover, logging hits is easier (if you group, you still have to dig through all hits to see how often spammers attempted to add <badcomain>.com). --Dirk Beetstra T C (en: U, T) 10:03, 25 November 2018 (UTC)[reply]
    @Beetstra: To be clear, I'm talking about a syntax like this: !(user_groups contains autoconfirmed) && blacklist_match( 'MediaWiki:Blacklist-autoconfirmed', added_links ). There is no huge blocks of regexes here, the regex is parsed from a separate page with a blacklist-like format.
    As you say this has some shortcomings (each type of blacklist needs to be on a separate page, which is maybe not always the most convenient choice; it's hard to tell exactly which blacklist item was triggered), on the other hand, it is actually possible to do with limited resources, so the question really is, is it preferable over nothing? (Or more optimistically, over waiting two years for an AbuseFilter rewrite.) --Tgr (talk) 20:56, 25 November 2018 (UTC)[reply]
    I guess this would be better than nothing, but I still think that this is going to be resulting in a massive number of rules (and lacks the possibility to pagespecific whitelist rules (‘everywhere, except <subject> for <subject.com>’).
Waiting 2 years for a rewrite .. we’re waiting for 10 years already. The protecting part of Wikipedia is utterly neglected, and will be neglected for years to come ... —Dirk Beetstra T C (en: U, T) 22:52, 25 November 2018 (UTC)[reply]