Talk:Spam blacklist/About

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

This page is not clear enough. It seems like it has been writen for people who already know what's going on with all that spam that the page is talking about. I have read the introduction, and still i can not understand what is going on and why URLs are being deleted on our he:wikipedia. 09:08, 13 June 2007 (UTC)
שמובבה

"Requests for listing" section links go nowhere[edit]

*[[#Proposed_additions|Proposed additions]] - request a link to be added to the spam blacklist *[[#Proposed_removals|Proposed removals]] - request a link to be '''removed''' from the spam blacklist *[[#Troubleshooting_and_problems|Troubleshooting and problems]] - having problems editing an article, other problem with the spam blacklist. This is '''not for requesting removal, for that see [[#Proposed_removals|Proposed removals]].

Suggested changes[edit]

Based on recent MediaWiki software changes, some new tools, and questions raised by confused editors at Talk:Spam blacklist, here are some text additions I propose for the Spam blacklist/About page. My wording here is rough and should be improved for clarity:


Blacklists and whitelists built into MediaWiki[edit]

MediaWiki software now has the following features with which to mitigate spam:
  1. The Wikimedia Foundation's global Spam blacklist on meta:
    • MediaWiki's software filters prevent links to any blacklisted domain to be added to any page on any project that uses this blacklist. It is impossible to save edits to any page that contains a links to blacklisted domain.
    • Applicability:
      • All 700+ Wikimedia Foundation projects (Wikipedia, Wiktionary, etc) in all languages
      • Other unaffiliated wikis (such as all Wikia sites) that run on MediaWiki software and incorporate Wikimedia's global spam blacklist in their own spam filtering
    • Domains normally are only listed if they have been spammed to at least two Wikimedia sites.
    • URL redirect domains should be blacklisted here since they are often used to bypass spam filtering. They can also be used to manipulate readers into visiting phishing or sites that exploit browser vulnerabilities. Editors that wish to link to a non-blacklisted site via an intermediate URL redirect domain (such as tinyurl.com) should instead link directly to the actual site, not the redirect address.
    • Domains to be blacklisted should be listed and discussed first at Talk:Spam blacklist
    • Addition to or removal from the spam blacklist can be done by any Meta-Wiki administrator.
  2. Each wiki running the current version of the MediaWiki software can also blacklist a domain locally. The local page name for the local blacklist is typically [[MediaWiki:Spam-blacklist]]
    • Applicability: links locally blacklisted domains are only blocked on the local wiki where they've been blacklisted
    • Addition to or removal from a local spam blacklist can be done by any administrator on that wiki.
  3. Each wiki running MediaWiki software can also whitelist a domain, subdomain or individual web page locally. The local page name for the local blacklist is typically [[MediaWiki talk:Spam-whitelist]]. Whitelist entries override local and global blacklist filters.
    • Applicability: addresses locally whitelisted can be added to pages on that wiki but not on any others.
    • Addition to or removal from a local spam whitelist can be done by any administrator on that wiki.

Tools for finding cross-wiki spam[edit]

1. Eagle 101's cross-wiki search tool searches the largest Wikipedias for instances of a particular domain. Users can specify the number of projects they wish to search from 1 (just the largest, en.wikipedia) to the 57 largest.
2. Luxo's cross-wiki contribution search tool searches all 700+ foundation projects for contributions by a given IP or username. A maximum of 20 edits per project are shown; full user edit histories for a given project can be obtained by clicking on the link to that project from the report:
This tool occasionally miss some contributions if there are MySQL problems.
3. A specified domain or subdomain on any Wikimedia project can be searched on that wiki by using the "External links" tool ([[Special:Linksearch]]) found on the list of special pages ([[Special:Specialpages]]). A link to the Special pages is found in the left-hand column of each wiki page in the "toolbox" section. (The names, "Special pages", "toolbox" and "External links" will vary by language and project.)

--A. B. (talk) 18:23, 21 August 2007 (UTC)

On blacklisting policy and practice[edit]

This is long, because I spent years studying this problem, having discussed it at length with Beetstra, in other places, mostly Talk pages, and I had a study of blacklisting on Wikipedia that was deleted after I was banned. It was just a pile of evidence.

A discussion of archive.is was started on Talk:Spam blacklist. Because that page is a noticeboard where requests for blacklisting or delisting are considered, discussion of blacklist policy and practice, more than needed to address a specific situation, is out of place. I pointed out that the entire filing was out of place, because the filer knew, from the start, that it was not going to result in blacklisting. Apparently, however, he wanted to argue that it should.

I have a penchant for describing what happened, with evidence. Sometimes people assume, when I do this, that I'm attacking another user, who might have stated something that could appear contradictory to the evidence. No, I'm not. Below, I describe the administrator who filed the report as "valiant." I mean that. This user has worked long and hard for years, and has been extraordinarily helpful when others might have turned away. However, we obviously are not in agreement on certain issues. We are in agreement on many other issues.

There is a discussion section of that page, which is where such a request for discussion should possibly have gone, if it was going to be added to the page. I'd suggest removing that section and pointing instead to this page, for discussion of policy, etc., out of consideration for those who watch Spam blacklist, being either administrators concerned about decisions to make, or those who want to see actual blacklisting practice or history. At most there would be on that Talk page, then, a single pointer to a discussion here, in the "Discussion" section.

After initial comments, I backed away from the discussion. The comment below, from Beetstra was then added, beginning with a quote of what I'd last written: Because the full original comment may be read here, I'm interspersing response for efficiency:

"it is not controversial that archive.is links were massively added by many IPs" - you know what that can be violating, and what it was found to be violating - a person involved with the website pushed their links because they found them good after they were told on their account that that is not the way Wikipedia works, and that is why we are discussing here.

My point was that it was not necessary to post a truly enormous list of IPs, it was accepted and known that many links were added by IP, and that at least some or much of this was COI. That entire list, filling up the bulk of the page, could have been replaced by one uncontroversial sentence: "Links to this site were added by many IP edits, including many apparently by bot through open proxies, many from IP associated with archive.is."

However, many other links may have been added by IP independently, given all the attention that had been brought, and these edits could match, in form, those of the COI editor and others. There is no direct evidence that these were all bot edits, just claims and accusations. I have seen no study of the actual numbers. The claim was also made that illegally hacked computers had been used, "botnets." Since archive.is has known owners, that would have been phenomenally hazardous, legally. No, either they used ordinary open proxies, or these were actually independent edits. I know of no way to discriminate between them, though it would be possible to identify open proxies.

I understand and accept that blocking and blacklisting are legitimately used to stop flooding that can overwhelm project process. However, there is another side: what if all those edits are, in fact, good edits, or at least reasonable? So far, I have seen not one single allegation of an archive.is link being added that was harmful. So massive effort to prevent them, if it takes place, is not improving the project and may be damaging it.

Blacklisting the site here will require some level of effort on every WMF project. Is it worth the effort? I have been faced with edits rejected by the spam blacklist. If I didn't know the process, I'd have been completely bollixed, and these were edits where I was not adding the link. The blacklisting made it difficult to make other changes to the page. For this reason, it is essential that all links to a blacklisted site be removed or disabled, promptly if not before blacklisting.

  • Nor is it an excuse that discussions should not continue because the page gets long.

Excuse for what? Is the page for discussion or for decision? If it's a discussion page, and particularly in the report section, the obvious use of the page as a report/action noticeboard is damaged. And why? What is the gain?

  • ALL requests here are brought to discussion, and a decision may come out of that.

That is right, but discussion, on a noticeboard page, is rationally related to decision and I've seen requests archived quickly after a decision has been made, when discussion continued that was not related to decision.

  • I still consider the option of global blacklisting with whitelisting on the wikis that have explicitly agreed with the influx of links. We are not here to promote archive.is.

Of course not. Nor are we here, in fact, to condemn them. Someone who has spent years valiantly fighting spam and COI editing -- those are really two different problems, though often related -- may come to think of COI editors as disruptive, evil, to be resisted at all costs. Yet COI editors might be intending to support the project. In this case, yes, they were rejected, but rejected, not by the project as a whole, but by what I call the "core," that is, the relatively small number of editors who pay attention to central process, heavily weighted toward administrators. To even have a discussion to establish a major project consensus can be massively disruptive, a train wreck.

When the core decides on what, in politics, would be called an "unfunded mandate," i.e., that requires someone else to do a lot of work, the core is helpless. If it actually represented a project consensus, enforcement would be easy, everyone would pitch in. That obviously is not happening, see below.

Wikipedia is not a battleground, except to the antispam warriors, very explicitly, see WikiProject Spam, where an image of a battleship firing all those big guns has been proudly displayed for many years.

When big guns are fired, "off the coast of Wikipedia," there is collateral damage. I've seen and documented it. I took one case to enwiki ArbComm, successfully. Beetstra knows the cases.

Yes, global blacklisting with an option to whitelist is a possible solution, but in the absence of a whitelist, and until whitelisting, there would be, meanwhile, massive disruption. There are, just now, 30,842 links to archive.is on enwiki alone.[1], up from 30,831, at about 18:27, 10 February 2014 (UTC). In the enwiki blacklisting discussion, there were 27,309, as of 3 December 2013.

When w:Wikipedia:Archive.is RFC was filed, the filer claimed that "at this point, over 10000 links to archive.is remain in Wikipedia." The RfC was hotly contested, but nobody challenged this number. So, from that filing (20 September, 2013, closed with an apparent decision to remove links and blacklist, 31 October, 2013, the "problem" has been getting steadily worse. There was, in that discussion, full consensus (6 supports, no opposition) on one option: Replace bot-added archive.is links where possible, leave human-added links intact. The non-admin closer did not address that possibility, but went for another proposal, "full removal" that was 19:6. However, the full removal option was actually not full removal; there was support that, for example, was "remove the bot-added links but allow humans to re-add." The closer is on wikibreak, on his Talk page is a request for help opposing the blacklisting of archive.is.[2]. There is discussion on the user's talk page, where another regular user says he was "hit with an edit block." I cannot view the abuse filter to determine what is happening. Blocked editors are not allowed to view the abuse log. I could log out and stand on my head and mutter a magic incantation to see it, but I think I won't.

Now, arguments for delisting of useful sites have always been met with "Get it whitelisted for a few links, and we can then consider delisting." Well, if archive.is is blacklisted, there would be many whitelistings, that's quite clear. So ... what is the point of blacklisting? It would obviously be to protect those poor, uninformed small wikis. Now, with the tools Beetstra has, he could readily identify small wikis which have many links incoming, and especially if these are incoming by IP. He could ask the small wiki, easily. I'd bet there is a global sysop who might blacklist locally, all over the WMF. We know, now, that archive.is is not coming out of Wikipedia anytime soon.

(This kind of practice was done by a steward when there was a global lock on an allegedly globally banned user. Because it was clear that some wikis wanted the user to be allowed to edit, the steward went to every wiki and blocked the user, thus allowing local wikis to over-rule the block, instead of using the maximally intrusive global lock provision.)

If these links were truly a problem, there would be ready solutions. The problem is that those solutions would do damage to the actual project, which is, after all, an encyclopedia, with a verifiability policy, and these links improve verifiability. I've seen that many times. Where a reference was behind a paywall, one time, I personally went to a medical library to read the reference, because something was fishy. Indeed, the citation had misrepresented what was in the reference, which actually implied the opposite of what was being claimed in Wikipedia text. Because it was difficult to verify, a false medical claim had stood for, as I recall, several years.

On the other page, I cited a review of the last link at the time. It was just one picked, I did not look at several and pick the one that supported by argument. Many others have done the same, and have reported that every link is to a real archived web page. In all the discussion, nobody has ever claimed that a link was misleading or, in itself, illegitimate. It is often claimed that the site is "commercial," as if that mattered. Apparently it is not. There is no advertising. Perhaps one day there might be.

So, today, I looked at link #30,831, the last in the list. Added 10 February, 2014, by an editor with an interest in w:John M. Conroy, registered in 2009. There is no warning on his Talk page. Easily, the edit filter could handle creating a list of additions of links, and editors could be warned, if there really is a project consensus. It's clearly not being done. I suspect that the edit filter, though, is preventing IP editors from adding links. I'm not about to test it (I'm banned from Wikipedia.)

Beetstra has the tools to study this, but I don't see that he's done it.

A bot could remove all the IP additions of those links, and IP editors don't have watchlists and scream when their contributions are reverted. Has that been done? I don't know. Beetstra would know, I assume. He can read the edit filters themselves.

If a bot should not be used to remove the links, the global blacklist should not be used to prevent their addition by any editor. The argument that URLs can be whitelisted is standard. I have seen what happens normally with whitelist requests. Good ones, are ignored, for months, and even if someone eventually responds, the editor may be no longer watching. Requests from a COI editor can result in immediate block. I got pages whitelisted because I didn't take "[nothing]" for an answer. When one request sat for a long time without response, I went to w:WP:AN and requested administrative assistance, which was promptly granted, irritating the regulars. "Forum shopping." But the request had not been denied, simply ignored. Yes, I was a PITA. I knew how to get things done. Most editors will simply give up. In researching lyrikline.org, and repairing the damage, after Beetstra had kindly whitelisted the English access to the site on en.wiki, I found a number of pages where editors had moved around the blacklist by eliminating or munging the link. Most editors would not go that far, and few would request whitelisting. We never actually see most of the damage done by overzealous blacklisting. And, of course, we also don't see the benefit. With an edit filter or bot, as has been used by Beetstra on occasion, activity can be monitored. --Abd (talk) 21:11, 13 February 2014 (UTC)

Please condense this to a paragraph or so if you wish us to read it. Snowolf How can I help? 06:55, 16 February 2014 (UTC)
Indeed. --Dirk Beetstra T C (en: U, T) 12:34, 16 February 2014 (UTC)
I will. Understand, please, that it takes more time to write less. Thanks for the request. --Abd (talk) 19:02, 16 February 2014 (UTC)

Old blacklisting with scanty history[edit]

A recent request asked about the blacklisting of "jxlalk.com." It develops that this blacklisting is based on regex "xlal[0-9a-z-]*\.com". First thing I found looking for this was this edit of the blacklist, 30 April 2008. Many listings were removed from the blacklist, with (delisting every one of mine, not going to deal with "log" nonsense). Another admin reverted that immediately with (seriously bad idea).

The listings are still in the blacklist under "##Nakon" and there is a reference to Spam_blacklist/Log/Nakon/sbl. On that page, a link is given after this blacklisting: [3]. The link is to a spam edit adding xlalu.com and xlale.com page links on en.wikipedia, 30 September, 2006. Nakon was last active, as far as I've checked, in April 2014. It might be possible to reach him if needed.

Nakon blocked the IP indef [4], October 30, 2006. Still blocked. All the visible spam edits were on one day (Sept. 30, 2006). The IP was blocked on Sept. 30 for 24 hours. There are no global edits for that IP, only en.wiki. A search for that IP on open proxy lists showed nothing after 2006. xlalu.com is not currently registered. xlale.com is registered but did not respond to pings.

  • [5] added xlale.com 8 November 2006
  • [6] changed xlale\.com to xlal[0-9a-z-]*\.com 21 October 2006
  • [7] added xlale.com 02:34, 30 September 2006.

xlale.com is also listed independently on the Nakon blacklist page, linking to [8] The titled page was deleted (w:Return_(book)) in December 2006. The diff, however, is from November 3, 2006.

The en.wiki blacklist was not started until 2007. Present practice would not ordinarily blacklist without cross-wiki spamming. Nakon also indef blocked many IPs as open proxies, which is a deprecated practice.

The history is reasonably clear: there was spam added September 30, 2006 to enwiki, pointing to xlale.com. The IP was blocked immediately. Nakon blacklisted. At some point Nakon also saw the xlalu.com spam, and added regex to cover it, October 21. The readdition of xlale.com on November 8 is a mystery; an admin could tell, perhaps. (Edits to Nakon's bl page cannot be tracked because the page was deleted, the page that is linked above was copied instead of being moved.)

It is extremely unlikely at this point that the original sites will be spammed. The spam does not appear to have been heavy or cross-wiki originally. Further, there is no sign of any intention to extend the blacklisting beyond the two sites originally spammed together. (Googling xlale.com uncovered very old forum spam in English, similar to the Wikipedia spam.)

At this time, then, I recommend removing xlal[0-9a-z-]*/.com from the blacklist, due to collateral damage. jxlalk.com is a Chinese page and is extremely unlikely to be connected to xlale.com, even though xlale is now apparently owned by a Chinese registrant. --Abd (talk) 01:22, 21 January 2015 (UTC)

And that is where the problem lies .. "The spam does not appear to have been heavy or cross-wiki originally" - did you check all wikis? Evidence of abuse is given, you, nor I, have any indication that that evidence is complete - you just assume it is focused on one wiki. --Dirk Beetstra T C (en: U, T) 06:01, 21 January 2015 (UTC)
Oh, I did check, I see on one page about 10-15 IPs (te deleted page you mentioned), who all have a significant number of edits to other pages as well, as well as some IPs, obviously used by the same 'person' (or bot) on other wikis as well. --Dirk Beetstra T C (en: U, T) 11:37, 21 January 2015 (UTC)
Thanks for responding. Unfortunately I cannot see the page; I speculated that there was activity there. This is an unskilled, short-term spammer, probably. You could list the IPs. So far, nothing said has indicated any activity more recent than 2006. What sites were spammed? We know xlale.com and xlalu.com. Any others?
Obviously, I cannot check all wikis, given that the page with the IPs has been deleted. This was very old practice, your bot was not running, etc. What I saw was focused on one wiki, and the evidence cited by the blacklisting admin was just one wiki, and in those days, that wiki, enwiki, had no blacklist of its own. I wrote "appear" because I know that what I'm seeing is not everything. However, Beetstra, is there any evidence that a blacklisting is still needed? Are any of the blacklisted domains still accessible? The admin used regex to exclude two domains, and ones that he thought might be forthcoming, which do appear to have been connected. That he intended to match other domains is complete speculation. --Abd (talk) 20:01, 21 January 2015 (UTC)
Any others, digging deeper I found xlala .. this already costs significant time, and there is a quick solution: whitelist.
Any abuse since 2006? The sites were blacklisted, fat chance it .. stopped. That was the purpose of the blacklisting.
Do better your best, you can access more than you think - you may miss the deleted stuff, but that is exactly why there are admins active on that list, they can see the full evidence without the need of making sweeping statements or speculations. --Dirk Beetstra T C (en: U, T) 03:30, 22 January 2015 (UTC)
"That he intended to match other domains is complete speculation." .. let me think .. he could have added three rules, \bxlala\.com\b, \bxlale\.com\b and \bxlalu\.com\b if he intended to block those three, or he could have added xlal.*\.com\b if he intended to match also other domains ('.. that ... might be forthcoming ..') .. let me guess .. not much speculation going on over there. --Dirk Beetstra T C (en: U, T) 05:11, 22 January 2015 (UTC)