2017 Community Wishlist Survey/Miscellaneous/Overhaul spam-blacklist

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

โฌ… Back to Miscellaneous

  • Problem: The current blacklist system is archaic; it does not allow for levels of blacklisting, is confusing to editors. Main problems include that the spam blacklist is indiscriminate of namespace (an often re-occurring comment is that it should be possible to discuss about a link in talkspaces, though not to use it in content namespaces). The blacklist is a black-and-white choice, allowing additions by only non-autoconfirmed editors, or only by admins is not possible. Also giving warnings is not possible (on en.wikipedia, we implemented XLinkBot, who reverts and warns - giving a warning to IPs and 'new' editors that a certain link is in violation of policies/guidelines would be a less bitey solution).
  • Who would benefit: The community at large
  • Proposed solution: Basically, replace the current mw:Extension:SpamBlacklist with a new extension based on mw:Extension:AbuseFilter by taking out the 'conditions' parsing from the AbuseFilter and replace it with only parsing regexes matching added external links (technically, the current AbuseFilter is capable of doing what would be needed, except that in this form it is extremely heavyweight to use for the number of regexes that is on the blacklists). Expansions could be added in forms of whitelisting fields, namespace selectors, etc.
  • More comments:
  • Phabricator tickets: task T6459 (where I proposed this earlier)


  • I agree, the size of the current blacklists is difficult to work with; I would be blacklisting a lot more spam otherwise. A split of the current blacklists is also desired:
  • I still want to see a single, centralized, publicly available, machine readable spam blacklist for all the spammers, bots, black hat SEOs and other lowlifes so that they can be penalized by Google and other search engines. This list must continue to be exported to prevent spam on other websites. Autoblocking is also most useful here.
  • The same goes for URL shorteners and redirects -- this list would also be useful elsewhere. This is one example where the ability to hand out customized error messages (e.g. "hey, you added a URL shortener; use the original URL instead") is useful.
  • The remaining domains might belong on a private list with all the options described above.
  • Please consider integrating the extension into core MediaWiki; it is already bundled with the installer. MER-C (talk) 11:57, 14 November 2017 (UTC)
    • Do note that there are a lot of domains on the blacklist which are not due to 'lowlifes' - quite a number of pornographic sites are blacklisted because of uncontrollable abuse, not because of them being spammed, let alone by site-owners or their SEOs. Also URL shorteners are blocked because of nature and abuse, not because of themselves being spam. In those cases I actually agree with complaints that these sites are penalized for being on the blacklists. I do agree that a full list of those domains that are due to the SEO/spammers/bots and other lowlifes should be publicly visible (note: COIBot and LiWa3 collect all the blacklists in off-wiki files for referencing purposes, it would be rather easy to publish those collective records on-wiki as public information). --Dirk Beetstra T C (en: U, T) 12:12, 14 November 2017 (UTC)
  • Another suggestion: one needs to have the option to match against norm(added_lines) instead for continued spamming of blacklisted links. I've seen forum spam that needs this solution, we need to have an equivalent here as well. MER-C (talk) 12:28, 14 November 2017 (UTC)
    • Check, but I think that that type of parsing is (partially?) in the current blacklist. I have seen XLinkBot-evasion by using hex-codes (which I subsequently coded into the bots). --Dirk Beetstra T C (en: U, T) 12:31, 14 November 2017 (UTC)
  • @Beetstra: For the sake of clearance: you want to replace AbuseFilter extension or you want to add a new extension based on AbuseFilter? --Vachovec1 (talk) 21:20, 14 November 2017 (UTC)
    • This proposes to replace mw:Extension:SpamBlacklist with this functionality. MER-C (talk) 03:03, 15 November 2017 (UTC)
    • @Vachovec1: I want add a new extension based on AbuseFilter (that seems to me the most logical start, as functionality in the AbuseFilter is quite appropriate, but too heavy for this), to replace the current spam-blacklist. --Dirk Beetstra T C (en: U, T) 05:22, 15 November 2017 (UTC)

My issue with this (as I have with supposed โ€œspam-fightingโ€) is that it takes way too much collateral damage both when it comes to users as when it comes to content, many useful sites are blacklisted purely because a user is banned, and if a user gets globally banned the link ๐Ÿ”— gets globally blacklisted and removed from any Wikimedia property even if it were used as a source 100% of the time, now let's imagine a year or so later someone wants to add content using that same link (which is now called a โ€œspamlinkโ€) this user will be indefinitely banned simply for sourcing content. I think ๐Ÿค” that having unsourced content is a larger risk to Wikimedia projects than alleged โ€œspamโ€ has ever been. This is especially worrisome for mobile users (which will inevitably become the largest userbase) as when you're attempting to save an edit it doesn't even warn you why your edit won't save, but simply says โ€œerrorโ€ so a user might attempt to save it again and then gets blocked for โ€œspammingโ€. Abuse filters currently don't function 100% accurately, and having editors leave the project forever simply because they attempted to use โ€œthe wrong ๐Ÿ‘Ž๐Ÿปโ€ reference is bonkers. Sent ๐Ÿ“ฉ from my Microsoft Lumia 950 XL with Microsoft Windows 10 Mobile ๐Ÿ“ฑ. --Donald Trung (Talk ๐Ÿคณ๐Ÿป) (My global lock ๐Ÿ˜’๐ŸŒ๐Ÿ”’) (My global unlock ๐Ÿ˜„๐ŸŒ๐Ÿ”“) 10:15, 15 November 2017 (UTC)

Also after a link could be blacklisted someone might attempt to translate a page and get blocked, the potential for collateral damage is very high, how would this "feature" attempt to keep collateral damage to a minimum? --Donald Trung (Talk ๐Ÿคณ๐Ÿป) (My global lock ๐Ÿ˜’๐ŸŒ๐Ÿ”’) (My global unlock ๐Ÿ˜„๐ŸŒ๐Ÿ”“) 10:15, 15 November 2017 (UTC)
@Donald Trung: that is not going to change, actually, this suggestion is giving more freedom on how to blacklist and whitelist material. The current system is black-and-white, this gives many shades of grey to the blacklisting system. In other words, your comments are related to the current system.
Regarding the second part of your comment - yes, that is intended use of the system, if it is spammed to page one, then translating that page does not make it a good link on the translation (and actually, this situation could actually also be avoided in the new system). --Dirk Beetstra T C (en: U, T) 10:39, 15 November 2017 (UTC)
  • The blacklist currently prevents us from adding a link to a site, from the article about that site. This is irrational. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:03, 15 November 2017 (UTC)
    • @Pigsonthewing: What do you mean, do I have an unclear sentence? If it is what I think, is that I would like per-article exceptions (though that is a less important feature of it). --Dirk Beetstra T C (en: U, T) 14:29, 15 November 2017 (UTC)
    • Ah, I think I get it, you are describing a shortcoming of the current system - that is indeed one of the problems (though there are reasons why sometimes we do not want to do that (e.g. malware sites), or where the link gets more broadly blacklisted (we blacklist all of .onion, which is then indeed not linkable on .onion, but also not on subject X whose official website is a .onion .. ). But the obvious cases are there indeed. I would indeed like to have the possibility to blanket whitelist for specific cases, like <subject>.com on <subject> (allowing full (primary) referencing on that single page, it is now sometimes silly that we have to allow for a /about to link to a site on the subject Wikipage to avoid nullifying the blacklist regex, or a whole set of specific whitelistings to allow sourcing on their own page), or on heavily abused sites really allow whitelisting only for a very specific target ('you can only use this link on <subject> and nowhere else'). --Dirk Beetstra T C (en: U, T) 14:35, 15 November 2017 (UTC)