2017 Community Wishlist Survey/Miscellaneous/Overhaul spam-blacklist
- Problem: The current blacklist system is archaic; it does not allow for levels of blacklisting, is confusing to editors. Main problems include that the spam blacklist is indiscriminate of namespace (an often re-occurring comment is that it should be possible to discuss about a link in talkspaces, though not to use it in content namespaces). The blacklist is a black-and-white choice, allowing additions by only non-autoconfirmed editors, or only by admins is not possible. Also giving warnings is not possible (on en.wikipedia, we implemented XLinkBot, who reverts and warns - giving a warning to IPs and 'new' editors that a certain link is in violation of policies/guidelines would be a less bitey solution).
- Who would benefit: The community at large
- Proposed solution: Basically, replace the current mw:Extension:SpamBlacklist with a new extension based on mw:Extension:AbuseFilter by taking out the 'conditions' parsing from the AbuseFilter and replace it with only parsing regexes matching added external links (technically, the current AbuseFilter is capable of doing what would be needed, except that in this form it is extremely heavyweight to use for the number of regexes that is on the blacklists). Expansions could be added in forms of whitelisting fields, namespace selectors, etc.
|The following discussion has been closed. Please do not modify it.|
This should overall be much more lightweight than the current AbuseFilter (all it does is regex-testing as the spam-blacklist does, only it has to cycle through maybe thousands of AbuseFilters). One could consider to expand it to have rules blocked or enabled on only certain pages (for heavily abused links that actually should only be used on it's own subject page). Another consideration would be to have a 'custom reply' field, pointing the editor that gets blocked by the filter as to why it was blocked.
Possible expanded features:
- More comments:
- Phabricator tickets: task T6459 (where I proposed this earlier)
- I agree, the size of the current blacklists is difficult to work with; I would be blacklisting a lot more spam otherwise. A split of the current blacklists is also desired:
- I still want to see a single, centralized, publicly available, machine readable spam blacklist for all the spammers, bots, black hat SEOs and other lowlifes so that they can be penalized by Google and other search engines. This list must continue to be exported to prevent spam on other websites. Autoblocking is also most useful here.
- The same goes for URL shorteners and redirects -- this list would also be useful elsewhere. This is one example where the ability to hand out customized error messages (e.g. "hey, you added a URL shortener; use the original URL instead") is useful.
- The remaining domains might belong on a private list with all the options described above.
- Please consider integrating the extension into core MediaWiki; it is already bundled with the installer. MER-C (talk) 11:57, 14 November 2017 (UTC)
- Do note that there are a lot of domains on the blacklist which are not due to 'lowlifes' - quite a number of pornographic sites are blacklisted because of uncontrollable abuse, not because of them being spammed, let alone by site-owners or their SEOs. Also URL shorteners are blocked because of nature and abuse, not because of themselves being spam. In those cases I actually agree with complaints that these sites are penalized for being on the blacklists. I do agree that a full list of those domains that are due to the SEO/spammers/bots and other lowlifes should be publicly visible (note: COIBot and LiWa3 collect all the blacklists in off-wiki files for referencing purposes, it would be rather easy to publish those collective records on-wiki as public information). --Dirk Beetstra T C (en: U, T) 12:12, 14 November 2017 (UTC)
- Another suggestion: one needs to have the option to match against
norm(added_lines)instead for continued spamming of blacklisted links. I've seen forum spam that needs this solution, we need to have an equivalent here as well. MER-C (talk) 12:28, 14 November 2017 (UTC)
- @Beetstra: For the sake of clearance: you want to replace AbuseFilter extension or you want to add a new extension based on AbuseFilter? --Vachovec1 (talk) 21:20, 14 November 2017 (UTC)
- This proposes to replace mw:Extension:SpamBlacklist with this functionality. MER-C (talk) 03:03, 15 November 2017 (UTC)
- @Vachovec1: I want add a new extension based on AbuseFilter (that seems to me the most logical start, as functionality in the AbuseFilter is quite appropriate, but too heavy for this), to replace the current spam-blacklist. --Dirk Beetstra T C (en: U, T) 05:22, 15 November 2017 (UTC)
- OK. Then I would propose to start the section Proposed solution with something like: "Replace the mw:Extension:SpamBlacklist with a new extension based on mw:Extension:AbuseFilter.", to make it crystal clear. --Vachovec1 (talk) 10:52, 15 November 2017 (UTC)
My issue with this (as I have with supposed “spam-fighting”) is that it takes way too much collateral damage both when it comes to users as when it comes to content, many useful sites are blacklisted purely because a user is banned, and if a user gets globally banned the link 🔗 gets globally blacklisted and removed from any Wikimedia property even if it were used as a source 100% of the time, now let's imagine a year or so later someone wants to add content using that same link (which is now called a “spamlink”) this user will be indefinitely banned simply for sourcing content. I think 🤔 that having unsourced content is a larger risk to Wikimedia projects than alleged “spam” has ever been. This is especially worrisome for mobile users (which will inevitably become the largest userbase) as when you're attempting to save an edit it doesn't even warn you why your edit won't save, but simply says “error” so a user might attempt to save it again and then gets blocked for “spamming”. Abuse filters currently don't function 100% accurately, and having editors leave the project forever simply because they attempted to use “the wrong 👎🏻” reference is bonkers. Sent 📩 from my Microsoft Lumia 950 XL with Microsoft Windows 10 Mobile 📱. --Donald Trung (Talk 🤳🏻) (My global lock 😒🌏🔒) (My global unlock 😄🌏🔓) 10:15, 15 November 2017 (UTC)
- Also after a link could be blacklisted someone might attempt to translate a page and get blocked, the potential for collateral damage is very high, how would this "feature" attempt to keep collateral damage to a minimum? --Donald Trung (Talk 🤳🏻) (My global lock 😒🌏🔒) (My global unlock 😄🌏🔓) 10:15, 15 November 2017 (UTC)
- @Donald Trung: that is not going to change, actually, this suggestion is giving more freedom on how to blacklist and whitelist material. The current system is black-and-white, this gives many shades of grey to the blacklisting system. In other words, your comments are related to the current system.
- Regarding the second part of your comment - yes, that is intended use of the system, if it is spammed to page one, then translating that page does not make it a good link on the translation (and actually, this situation could actually also be avoided in the new system). --Dirk Beetstra T C (en: U, T) 10:39, 15 November 2017 (UTC)
- The blacklist currently prevents us from adding a link to a site, from the article about that site. This is irrational. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:03, 15 November 2017 (UTC)
- @Pigsonthewing: What do you mean, do I have an unclear sentence? If it is what I think, is that I would like per-article exceptions (though that is a less important feature of it). --Dirk Beetstra T C (en: U, T) 14:29, 15 November 2017 (UTC)
- Ah, I think I get it, you are describing a shortcoming of the current system - that is indeed one of the problems (though there are reasons why sometimes we do not want to do that (e.g. malware sites), or where the link gets more broadly blacklisted (we blacklist all of .onion, which is then indeed not linkable on .onion, but also not on subject X whose official website is a .onion .. ). But the obvious cases are there indeed. I would indeed like to have the possibility to blanket whitelist for specific cases, like <subject>.com on <subject> (allowing full (primary) referencing on that single page, it is now sometimes silly that we have to allow for a /about to link to a site on the subject Wikipage to avoid nullifying the blacklist regex, or a whole set of specific whitelistings to allow sourcing on their own page), or on heavily abused sites really allow whitelisting only for a very specific target ('you can only use this link on <subject> and nowhere else'). --Dirk Beetstra T C (en: U, T) 14:35, 15 November 2017 (UTC)