Community Wishlist Survey 2021/Admins and patrollers/Overhaul AbuseFilter

Random proposal ►◄ Admins and patrollers The survey has concluded. Here are the results!

Overhaul AbuseFilter

Problem: Per admission by Ryan Kaldari (WMF) during the 2018 Community Wishlist Survey: " Unfortunately, the AbuseFilter extension has been mostly unmaintained for years and would need to be overhauled ...".
Who would benefit: Editors on all Wikipedia's
Proposed solution: Free resources to overhaul the AbuseFilter. Make it more modular so that it can also accomodate other modules (e.g. a module as suggested in Community Wishlist Survey 2021/Admins and patrollers/Overhaul spam-blacklist)
More comments: Previous discussions: Community Wishlist Survey 2019/Admins and patrollers/Overhaul spam-blacklist
Phabricator tickets: task T20110
Proposer: Dirk Beetstra ^{T C} (en: U, T) 06:17, 17 November 2020 (UTC)[reply]

Discussion

Suggested addition: have an option 'show captcha' as result for a hit, with that, both for IPs and logged in editors, you can filter whether an account is genuine and not a spambot. It may be annoying for the regular editor who is acting in good faith, but should stop most of the bot edits and could be used to thwart vandals. --Dirk Beetstra ^{T C} (en: U, T) 06:17, 17 November 2020 (UTC)[reply]
@Beetstra: Grants:Project/Daimona Eaytoy/AbuseFilter overhaul has started, which should greatly improve the situation. phab:T186960 for instance could be considered as making it more "modular", perhaps. The captcha task you mention is not on the Overhaul-2020 board, however. Would you like to refocus your proposal on the captcha bit? MusikAnimal (WMF) (talk) 06:34, 17 November 2020 (UTC)[reply]
@MusikAnimal (WMF): maybe this request as it is gives a bit of an extra push (if it receives much support). The Captcha-task could be as a separate phab ticket linked to either. --Dirk Beetstra ^{T C} (en: U, T) 06:57, 17 November 2020 (UTC)[reply]
@Beetstra: My concern is the proposal as written is very broad without a clear problem statement. It needs to be more definitive so we know what we're voting for. The CAPTCHA idea I think is great as a standalone proposal. One thing I can assure you is that AbuseFilter is no longer unmaintained. It's been under heavy development since Kaldari made that statement two years ago. MusikAnimal (WMF) (talk) 23:17, 17 November 2020 (UTC)[reply]
As a user I find captchas extremely annoying. And usually there is discrimination involved. You're using Linux? That's suspicious etc. --Shoeper (talk) 14:52, 23 November 2020 (UTC)[reply]
@Shoeper: in a way, the 'being extremely annoying' is true, but that is also part of the function. Spambots are a reality where the use of captchas does way more good than the bit of damage from catching good edits (we shouldn't be enabling this on all external links, just external links additions that follow a certain pattern). Basically you have to have a filter that results in >99% positive hits of a certain type before you enable the captcha. --Dirk Beetstra ^{T C} (en: U, T) 06:01, 24 November 2020 (UTC)[reply]
@Beetstra: Some may believe they can achieve ">99% positive hits". In reality there are always faults and regular users are annoyed. Not even Cloudflare, nor Google (who if not Google?) are able to reliably detect bots WITHOUT annoying users. One point on Wikipedia is that it would only apply to edits, but I want to encourage to be extremely careful. If it is too annoying users won't edit pages. And editors are already a problem. How many non technical editors does Wikipedia have and how many women are editing Wikipedia at all? The last time I looked these figures were worrying. Some people may just quit the page, once a capture occurs. On the other hand professional attackers will find a way around the captchas. Basically, captchas are like DRM. You try to not allow someone to watch a video while the purpose of the video is to be watched. Regular users now have to live with the disadvantages and are annoyed while copycats still find ways around and publish accessible versions on the web. The bottom line is paying customers get a bad experience while it is still possible to copy the content for professionals. It will be the same on Wikipedia. Regular users are annoyed and professional spammers are going to bypass the captcha.--Shoeper (talk) 18:13, 28 November 2020 (UTC)[reply]
@Shoeper: what I mean is that the underlying filter can reach high levels of positive hits on spam material. You then have the choice, outright block them by setting the filter to disallow (which is going to annoy everyone and most will walk away because they don't know how to get 'around it'), or you throw a warning in front of them (which is going to annoy all people, but certainly all spammers/bots will just 'click it' away and is hence useless), or you feed a captcha to them, which will (hopefully) annoy ALL bots, massively slow down spammers (so their accounts/IPs can be (b)locked), and yes, annoy quite some of the genuine editors. It is then a choice between a completely annoying filter, a totally useless filter, or one that will seriously slow down spammers (if the captcha is good enough) and annoy a part of the remaining editors (and a part will just shrug like I do when I get a captcha) in a way similar to the 'click here to continue'-box. And it beats massively the situation where ALL IPs and ALL unconfirmed editors have to solve a captcha for any 'new' link they add. --Dirk Beetstra ^{T C} (en: U, T) 11:45, 29 November 2020 (UTC)[reply]
I would support having captcha added as a result of a hit. Would be useful to prevent spambots. Dreamy Jazz ^{talk to me | enwiki} 23:46, 19 November 2020 (UTC)[reply]
As pointed out by MusikAnimal (whom I'd like to thank), this idea of AbuseFilter being unmaintained should really stop. Yes, the codebase is legacy; yes, it has limitations; and yes, it used to be buggy. But things have changed a lot in the past couple of years, and the current codebase is actually much better than those of several other deployed extensions. I also agree that the current proposal is too broad, and integrating a "SpamBlacklist module" inside AbuseFilter is not going to happen. These tools are different, and even if their scopes overlap, it doesn't mean one should be merged into the other. OTOH, I would strongly support the addition of captchas as an AbuseFilter action. The current AbuseFilter overhaul should hopefully simplify (and unbreak) the process for adding custom actions, so this should be definitely doable once the overhaul is over. Should this proposal be selected, I'd certainly be happy to help the team with the implementation. --Daimona Eaytoy (talk) 16:22, 20 November 2020 (UTC)[reply]
@Daimona Eaytoy: there has for long been talk to make the AbuseFilter more modular (which, in my view, would mean that you have an AbuseFilter that has several modules that you can choose). Whether a 'spamblacklist' module should be part of that, or that that should be done differently (see Community_Wishlist_Survey_2021/Admins_and_patrollers/Overhaul_spam-blacklist) is up to them (though I insist that many of the AbuseFilter options would be great to have on the spam-blacklist, but I agree that the 'module' may be too different - but maybe a joint effort would be good). --Dirk Beetstra ^{T C} (en: U, T) 06:14, 23 November 2020 (UTC)[reply]

Voting

Strong support MarioSuperstar77 (talk) 21:05, 8 December 2020 (UTC)[reply]
Support Firestar464 (talk) 04:50, 9 December 2020 (UTC)[reply]
Support JPxG (talk) 06:03, 10 December 2020 (UTC)[reply]
Support. Meiræ 14:24, 10 December 2020 (UTC)[reply]
Support Strainu (talk) 09:39, 12 December 2020 (UTC)[reply]
Support Helder 09:48, 12 December 2020 (UTC)[reply]
Support as proposer. --Dirk Beetstra ^{T C} (en: U, T) 06:08, 13 December 2020 (UTC)[reply]
Support ·addshore· ^{talk to me!} 12:53, 14 December 2020 (UTC)[reply]
Strong support JN Dela Cruz (talk) 05:41, 19 December 2020 (UTC)[reply]
Support Geonuch (talk) 13:10, 20 December 2020 (UTC)[reply]
Support Barkeep49 (talk) 15:39, 20 December 2020 (UTC)[reply]
Support Schniggendiller (talk) 16:53, 21 December 2020 (UTC)[reply]