Talk:Spam blacklist

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by Finnrind (talk | contribs) at 20:57, 5 October 2010 (→‎Proposed removals: archive). It may differ significantly from the current version.

Latest comment: 13 years ago by Beetstra in topic Proposed removals
Shortcut:
WM:SPAM
WM:SBL
The associated page is used by the MediaWiki Spam Blacklist extension, and lists regular expressions which cannot be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any meta administrator can edit the spam blacklist. For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.
Proposed additions
Please provide evidence of spamming on several wikis. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Please check back after submitting your report, there could be questions regarding your request.
Proposed removals
Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.
Other discussion
Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
#wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.

Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.


Completed requests are marked as {{added}}/{{removed}} or {{declined}}, and are generally archived (search) quickly. Additions and removals are logged.

snippet for logging
{{sbl-log|2148864#{{subst:anchorencode:SectionNameHere}}}}

Proposed additions

This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived.

thepetitionsite.com/1/ban-wikipedia



There is a user (Using open proxies) adding links to a "Ban wikipedia" petition. So far, I've seen it on enwikipedia and enwikibooks, though I'm sure it's been on several other sites. enwikipedia has added it already, though they're getting around it so there's a filter in place, which they're also getting around. Ask me on IRC if you'd like the full URL to the petition, I'd rather not post it here. Pilif12p 01:47, 7 September 2010 (UTC)Reply

The other big one is petitiononline.com; blacklisting those two would take care of most petition links. I tend to agree with JzG that they are intended to be spammed (that's what an online petition is) and that they aren't ever valuable links. Gavia immer 02:43, 24 September 2010 (UTC)Reply
Thanks, Gavia immer. So,
  1. \bthepetitionsite\.com\b
  2. \bpetitiononline\.com\b
Any others? Kylu 02:58, 24 September 2010 (UTC)Reply
Those two are the biggest; it's mostly a question of whether we would want to preemptively block some sites that aren't being used and most likely wouldn't. Having said that:
\bgopetition\.com\b
\bipetition\.com\b
\bpetitions24\.com\b
\bpetitionsite\.com\b
\bpetitionthem\.com\b
\bpetition-them\.com\b
\bwebpetitions\.com\b
are all in the same business, and some of them, at least, appear to be a front for harvesting email addresses - fun stuff, that. Gavia immer 01:43, 28 September 2010 (UTC)Reply


















and add





En.wikipedia has '\bpetition(?:online|s)?\b' on the list, which practically blocks a lot of these sites (though not all listed here). There are quite some interesting discussions out there about this type of links. I also agree, most of these are hardly ever useful links, the only use would be the primary source of the numeric end-result - and if that number is notable enough to be mentioned, there will be independent sourcing for that. --Dirk Beetstra T C (en: U, T) 09:36, 28 September 2010 (UTC)Reply


Most of the .tk TLD

We had an earlier discussion about the .to URL shortening service (see Talk:Spam_blacklist/Archives/2009-12#Strange_url_shortener) that didn't seem to go anywhere, so I don't know if this will gain any traction, and in any case I can't supply a working regex that wouldn't block the whole domain - but per the policy on URL redirectors I'd like to point out that the registrar for the .tk is now offering semi-persistent public URL redirection from a large number of domains; see my.dot.tk/tweak/ www.dot.tk/en/index.html?lang=en, which I haven't linked in case they do get blacklisted. My guess is that this sort of URL redirection will only become more common, so it would be nice if we had a graceful way to deal with it. Gavia immer 22:47, 28 September 2010 (UTC)Reply

seekic.com chinaicmart.com





Spammers






















MER-C 02:24, 2 October 2010 (UTC)Reply

Added Added. --Finn Rindahl 06:32, 2 October 2010 (UTC)Reply

Generic Chinese knockoff spam 25.0







Similar domains

"Please confirm your shipping address before pay for it"































































































































































































































"Western Union is a very easy and quick way to send and receive"















































































































































"Astralia www.auspost.com"























































































































Spammers






MER-C 02:35, 2 October 2010 (UTC)Reply

Added Added. --Finn Rindahl 06:35, 2 October 2010 (UTC)Reply

url.isiss24.com



URL shortener was used to spam:




via the referral site:




Spammers






Note misleading edit summaries. I wouldn't list lovehoney.co.uk. MER-C 10:03, 3 October 2010 (UTC)Reply

Proposed additions (Bot reported)

This section is for domains which have been added to multiple wikis as observed by a bot.

These are automated reports, please check the records and the link thoroughly, it may report good links! For some more info, see Spam blacklist/Help#COIBot_reports. Reports will automatically be archived by the bot when they get stale (less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot).

Sysops
  • If the report contains links to less than 5 wikis, then only add it when it is really spam
  • Otherwise just revert the link-additions, and close the report; closed reports will be reopened when spamming continues
  • To close a report, change the LinkStatus template to closed ({{LinkStatus|closed}})
  • Please place any notes in the discussion section below the HTML comment

COIBot

The LinkWatchers report domains meeting the following criteria:

  • When a user mainly adds this link, and the link has not been used too much, and this user adds the link to more than 2 wikis
  • When a user mainly adds links on one server, and links on the server have not been used too much, and this user adds the links to more than 2 wikis
  • If ALL links are added by IPs, and the link is added to more than 1 wiki
  • If a small range of IPs have a preference for this link (but it may also have been added by other users), and the link is added to more than 1 wiki.
COIBot's currently open XWiki reports
List Last update By Site IP R Last user Last link addition User Link User - Link User - Link - Wikis Link - Wikis
vrsystems.ru 2023-06-27 15:51:16 COIBot 195.24.68.17 192.36.57.94
193.46.56.178
194.71.126.227
93.99.104.93
2070-01-01 05:00:00 4 4

Proposed removals

This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.

Remember to provide the specific domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as {{removed}} or {{declined}} and archived.

See also /recurring requests for repeatedly proposed (and refused) removals.

The addition or removal of a domain from the blacklist is not a vote; please do not bold the first words in statements.

ascendercorp.com



I recently edited a few existing articles [1][2] [3] on English Wikipedia. I attempted to support my facts with a reference from an official site and found that it was blacklisted. I felt my posts were unbiased and the reference was relevant and factual. I am interested in finding out if Ascendercorp.com can be removed from the blacklist, as it is a legitimate site containing applicable information related to fonts and typography. Drewpoleon 18:40, 29 July 2010 (UTC)Reply

Apologies to Drewpoleon that noone has responded to this request. We're all volunteers, and seem to be very short-staffed right now...
Added as related to spammed site ascenderfonts.com, per Talk:Spam_blacklist/Archives/2010-06#ascenderfonts.com. I advised against blacklisting at the time, but given the additional background provided by user:MER-C I'm not ready to remove it from the blacklist. Asking for second opinions here. Finn Rindahl 10:14, 2 October 2010 (UTC)Reply

yfrog.com



Hello, Our site seems to be blacklisted for spam. I can not find any evidence to support this, but that's what I've been told here http://en.wikipedia.org/wiki/MediaWiki_talk:Spam-blacklist#Proposed_removals

Yfrog is a website and Twitter service that allows users to share photos and videos on Twitter and to broadcast their life as it happens. It is free for users, and no registration is required. Yfrog.com is owned and operated by Imageshack.us which is a top 100 site.

We did have some issues with email pill spams, but have since fixed the problem. We have worked with multiple anti-spam sites (http://inboxrevenge.com, http://www.infiltrated.net, http://www.arbornetworks.com, http://shadowserver.org, and http://uribl.com to name a few). And have helped spread information on pill scams by replacing images uploaded by spammers into a warning image. Here is an example: http://img683.imageshack.us/img683/3548/upoufacigeya.gif

We also have since hired a team of 8 content moderators that review images, and are able to flag the images as they come in to our servers.

Please let me know if there's any spam problem on Wikipedia from yfrog and I'll personally take action to resolve the issue. Thank you in advance. npettas 01:51:40, 5 Aug 2010 (UTC)

Anyone??

Comment Comment Added April 2009 diff COIBot -- Kylu 20:22, 8 September 2010 (UTC)Reply
Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. However, if you believe that links to your domain will enhance the content of our projects, you should suggest inclusion of the link on the relevant talk page. In addition, some wikis have WikiProjects (like the English Wikipedia here). If there is a project for a subject area related to your domain, you can request that the project review the link. If the project, or trusted, high-volume editors support the use of your links because of its value on our projects, I'm sure the request will be carefully considered and your domain may well be removed. Since I blacklisted I recuse myself from taking any action here. --dferg ☎ talk 14:38, 15 September 2010 (UTC)Reply
Considering that the link have been previously added by 219 different accounts/ip's, and I really can't see a spamming pattern as such in these additions, I'm inclined to support removal from the BL, and will remove it in a few days unless someone objects. Finn Rindahl 10:02, 2 October 2010 (UTC)Reply

infosecinstitute.com

I believe we had a competitor spam our link to wikipedia in many places that it did not even have any significance with (the nature of being in the hacking and infosec industry). We had previously placed references and information on pages where we were an authority and have now been completely blacklisted. An example of where we are trying to contribute are on technical discussions such as http://en.wikipedia.org/wiki/Man-in-the-middle_attack. We have a video clearly showing this type of attack which should benefit that page (http://resources.infosecinstitute.com/video-man-in-the-middle-howto/) but I cannot add it.

I have been reviewing the history of this domain on Wikimedia's projects and it's complicated. I'm traveling but I will resume this in a few days. --A. B. (talk) 19:36, 29 August 2010 (UTC)Reply
I note you are able to link this site here, that already suggests you are not blacklisted here. You will have to discuss this issue on the three wikis where you seem to be blacklisted: en:MediaWiki talk:Spam-blacklist, ar:MediaWiki talk:Spam-blacklist and hi:MediaWiki talk:Spam-blacklist (those being the three that I could quickly find, there may be more). Maybe worth considering, seen the en:Joe Jobbing that you say is going on, that specific whitelisting might be a better option for the moment? --Dirk Beetstra T C (en: U, T) 10:47, 31 August 2010 (UTC)Reply
There may have been some Joe-jobbing but there was also spam coming from infosecinstitute.com IPs. As I said, the situation was complex and I hope to follow-up next week. --A. B. (talk)


wikio.com



Hi, I'm working with Wikio company and I have seen that your bot has detected an anormal activity that has conducted the website wikio.com to be blacklisted on wikipedia. Wikio is a news agregator service with elaborated algorythm to classify information, the service allow users to publish content if they consider that this content can interest other people. As all User Generated Content, the content is subject to spammers, so our service is moderated post publication.

All UGC content is publish under www.wikio.com/article/ directory.

It seems that someone has publish articles on Wikio and tried to promote their article pages by making a massive linking from wikipedia pages, if I understand well all the actions that have been listed here. http://meta.wikimedia.org/wiki/User:COIBot/LinkReports/wikio.com

The user has the same pseudo on wikio and wikipedia : matucana It can be seen on wikio com on the url /article/56367203

It seems that this guy has something against Berlusconi : http://matucana.wordpress.com/ So here, we have the case of a content on Wikio that is not Spam, but a guy that has tried to put a big number of backlinks on Wikipedia to promote his content on Wikio. The result is that Wikio has been blacklisted due to operations done by a third person with no relations with the website.

To fix that situation, I propose that you filter in your list the directory where Wikio users can publish UGC content, which is all urls starting with wwww.wikio.com/article/

I have already discussed that point with Beestra on this page : http://en.wikipedia.org/wiki/User_talk:Beetstra#Wikio_blacklisted_due_to_activity_of_a_blogger_who_wanted_to_promote_his_UGC_content

Thanks for your support for helping to fix this point. Christophe

elusiva.com



I am unsure why this site was added to the Wikipedia blacklist (and I am unsure how to find out), but I can only imagine that it is because people were forcefully trying to add links from the website to certain articles back in 2008, and it might have been a violation. Nevertheless, I would request that it be removed from said list. While I am an employee of Elusiva, I am not making this request on their behalf, I am not a member of their marketing department, and I hope to present this case in the most unbiased way I can. Elusiva is a software company which licenses virtualization software to over 30,000 customers and partners. Elusiva software is featured on the websites of several software resellers and there have been press-articles released dedicated to describing the benefits Elusiva software. It is somewhat respected in its area of expertise. It seems that the Wikipedia articles regarding Desktop Virtualization, Application Virtualization and several other articles in this field of technology are somewhat incomplete without the mention of Elusiva. Additionally, perhaps, in the foreseeable future, an informative article covering Elusiva—itself becoming a well-known technology provider—would be something of interest to the public. I cannot determine whether or not the excessive editing with links to elusiva.com (which indicate some conflict of interest) will ever happen again, but I would argue that it seems to have ceased for some time now. Perhaps, in light of all the above, the site should be removed from the blacklist.TovB 14:16, 15 September 2010 (UTC)Reply

The link was added based on this report User:SpamReportBot/cw/elusiva.com (see also User:COIBot/LinkReports/elusiva.com). It is not excessive spamming, and as such I would not oppose to remove it from the blacklist. I still find it hard to see why/where links to elusiva.com would be an useful addition to any wikipedia-article - the only possible article I could see would be one on the company itself. I'd like second opinions on this one. Best regards, Finn Rindahl 19:37, 1 October 2010 (UTC)Reply
I am not sure if I am meant to respond, but I figured I would clarify. The articles on Desktop Virtualization, Application Virtualization and Desktop Sharing all provide a somewhat informative list of companies which provide such software (as Elusiva does) along with links to those companies' Wikipedia articles. Perhaps an articles about Elusiva would be one of the steps in making those above mentioned articles a little more informative (although, I am no expert), but, from what I understand, the first step would have to be to have elusiva.com removed from the blacklist. Thanks for responding, TovB 13:30, 4 October 2010 (UTC)Reply

gamesff.com



I'm the new owner of this site, I seen the site added to the blacklist before 3 years ago. We are going to build a new website on this domain, I don't want that people see my site on the blacklist of wikideia when they will serach "gamesff" Can u please remove my site from the blacklist please? — The preceding unsigned comment was added by 79.181.12.112 (talk) 13:31, 1 October 2010

Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. However, if you believe that links to your domain will enhance the content of our projects, you should suggest inclusion of the link on the relevant talk page. In addition, some wikis have WikiProjects (like the English Wikipedia here). If there is a project for a subject area related to your domain, you can request that the project review the link. If the project, or trusted, high-volume editors support the use of your links because of its value on our projects, I'm sure the request will be carefully considered and your domain may well be removed. Kylu 13:42, 1 October 2010 (UTC)Reply

fiorano.com



Fiorano is a proper company and the spamming was the result, I believe, of an ill-advised junior employee. The issues were some time back, I think it's safe to remove this now. Please see OTRS ticket 2009120710020837 for more details, but the summary is that three or four OTRS volunteers, none of us meta admins but all I think enWP admins, are persuaded this is safe to remove by now. I hope we're not wrong about that. JzG 02:46, 1 October 2010 (UTC)Reply

Moved here from further down the page. Someone with OTRS access should check this one. Finn Rindahl 14:38, 1 October 2010 (UTC)Reply
Additional note, was added following this, the link from the log seem somehow broken... Already whitelisted at enwiki. Finn Rindahl 18:53, 1 October 2010 (UTC)Reply
Here's the history on Fiorano Software:
Meta:
en.wikipedia:
I personally have little confidence in Fiarano Software but lots of confidence in the OTRS team, so if you guys feel comfortable removing them from the blacklist, then I support you. --A. B. (talk) 03:19, 2 October 2010 (UTC)Reply
I'm a bit worried to see all those reports that A. B. posted above. I think that local whitelisting might be better. Will check the ticket, though. --dferg ☎ talk 10:29, 2 October 2010 (UTC)Reply
If we are going to delist this domain it needs to be on the understanding that any further spamming by/on behalf of Fiorano will lead to a permanent blacklisting. MER-C 10:34, 2 October 2010 (UTC)Reply
MER-C: um, "It wasn't us, our competitor spammed our link in order to have us permanently blacklisted! It's a frame-up!" ;) Kylu 19:00, 2 October 2010 (UTC)Reply
MER-C, that seems like a reasonable condition to me. I'm happy to communicate it to them via OTRS. JzG 22:10, 4 October 2010 (UTC)Reply

monetpainting.net



In accordance of discussion taken in commons:Commons:Village pump#Online Paintings Gallery blacklisted? I request the entry to be unlisted. Thank you. – Kwj2772 (msg) 08:20, 2 October 2010 (UTC)Reply

Removed. Was added for a good reason, but spamming is a while back now the site could indeed be useful. If the previous spamming-pattern repeats itself it should be readded however, and local projects could whitelist where the link is requested. --Finn Rindahl 09:39, 2 October 2010 (UTC)Reply

hockeyfights.com



This was listed as spam due to a false presumption that most pages contain embedded copyrighted video. Most pages are statistical in nature (wanted to leave sample links, but the update was blocked) and are used as sourcing throughout media and even Wikipedia. There was a comment about editorial oversight, but it's not editorial in nature, it is based off of official hockey penalty statistics. To add: the videos that are there are mostly from Youtube and are posted by and/or under claim/control of the proper leagues and/or broadcasters. No copyrights infringed.


idt.pt



This is the site of the portuguese Drugs and Toxicology Institute, and it is not spam. It is a good resource for portuguese people. I don't know why it has been putted on blacklist, but... I think that the criteria for sites to be or not be a threat must be made by people and not by machines or software... Please remove it from blacklist, because it's a governmental site, but here in Portugal some have the ".gov" in the link, some don't.

This URL isn't blacklisted here on Meta, on the English Wikipedia, or on the Portuguese Wikipedia. If you weren't able to add the link on some other project, you will need to address it there, on their local blacklist. Gavia immer 18:02, 3 October 2010 (UTC)Reply
A search by COIBot (which has a small lag, sometimes) did not find this domain black- or revertlisted anywhere. --Dirk Beetstra T C (en: U, T) 09:42, 4 October 2010 (UTC)Reply

lenr-canr.org



Library of hosted-by-permission copies of sources relating to w:Cold fusion. Many pages have been whitelisted by request on en.wikipedia. This site is not ordinarily a reliable source itself, but it is used for convenience links provided with references, typically for preprint or other copies of papers published under peer review, recent example: http://lenr-canr.org/acrobat/StormsEstatusofcoa.pdf

This paper is already referenced in the article, but a convenience link cannot be placed because of the blacklisting. A major usage is on Talk pages, so that editors may discuss sources for the article, being able to read them.

Rather than going back to the en.wiki whitelisting page, and since there never was a sound reason for blacklisting this in the first place (no true spamming links, merely content controversy, and the original en.wiki admin requesting blacklisting was reprimanded by ArbComm for his en.wikipedia blacklisting[4][5]), I request delisting. For reference, original blacklisting. --Abd 16:00, 4 October 2010 (UTC)Reply

This was discussed at some lenght when the link was added [6], and the closing admin here (user:Mike.lifeguard) concluded in January 2009 that there was a sound reason for blacklisting. Rather than repeating the whole discussion from two years ago - could you state what exactly has changed since 2009 regarding this link? Regards, Finn Rindahl 18:36, 4 October 2010 (UTC)Reply
P.S. As a courtesy I've notified the enwiki-user who originally requested blacklisting about this delisting-request. Finn Rindahl 18:43, 4 October 2010 (UTC)Reply
Nothing is different except the subsequent en.wikipedia ArbComm case on this, where the original requestor was a party (linked above). Mike.lifeguard decided something then, but the issues raised were later discussed in depth on wikipedia and rejected by consensus; I can provide links. I'm asking for a consideration de novo. When it became apparent that the requestor's personal, direct blacklisting on en.wiki was going to be questioned, he came here, making it moot. There was no spamming at all, nor any unusual volume of link addition, given that lenr-canr.org is a major repository of such papers: if you look at the paper cited as an example, published in a mainstream peer-reviewed journal, you'll see that the introduction points to lenr-canr.org for more information. I've seen that in many other peer-reviewed papers or academic publications. I will recoup the arguments, if necessary, but I'd only be repeating what is covered elsewhere. I will respond, however, to any specific arguments repeated here as to why lenr-canr.org should be blacklisted. --Abd 19:06, 4 October 2010 (UTC)Reply
Time to reinstate Abd's topic ban as he is clearly proxying for the banned editor Jed Rothwell again. Case closed, IMO. The request merely restates the previous disproven assertion of "by permission" copies of material where the copyright holder has no record of handing out such permission, the only evidence for which is the say-so of the site owner whose relentless promotion of this (along with proxying on his behalf by Abd and others) was largely responsible for the original blacklisting. The site is a place for promoters of a pariah field to colect and share, that's fine, but we really don't need any par tof it. It's not peer-reviewed, we have the DOI syntax which links to the canonical copies of everything, and at least some documents have been found to be heavily editorialised, so it;'s a spammed, copyright-violating, misrepresenting, unreliable source whose use is almost exclusively the promotion of a fringe view in defiance of canonical policy.
Or, to be more succinct, nothing's changed since Abd's last request was rejected, including the request itself. JzG 22:16, 4 October 2010 (UTC)Reply
JzG seems to be confused. His arguments are generally about content control, which ArbComm rejected, but he was able to get around that here, where ArbComm has no remit. But "content control" is also not the purpose of the global spam blacklist.
  • Copyvio depends on actual usage. This has been considered in detail for lenr-canr.org and rejected, many times. There is no issue of copyvio for linking to lenr-canr.org unless copyvio there is known. Copyvio is quite unlikely, lenr-canr.org only hosts with permission, if there are exceptions they would be rare.
  • Proxying. Completely irrelevant even if true, which it isn't.
  • Promotion. The site owner, at various times, when there was a relevant discussion, would point to a linked paper. That's not promotion. That's a COI editor pointing out a resource on Talk, like he's supposed to. The "spam links" pointed to in prior discussions were not links at all. They were Rothwell's signature, when he edited IP, "Jed Rothwell, librarian, lenr-canr.org." No http://.
  • Pariah field. Irrelevant and not true any more. The papers to be linked, themselves, are reliable source, not published by lenr-canr.org, generally in mainstream journals, and lenr-canr.org is just for convenience links.
  • Not peer-reviewed. Irrelevant. The papers are peer-reviewed, generally.
  • Found to be heavily editorialized. One paper', a government document, was republished by lenr-canr.org with an introduction, and the comment was brief and clearly distinguished. That has become an entire story in the hands of JzG. The question of whether or not to use a particular link is and should be a decision by the local editors. Most editors will not attempt to get a whitelisting for a convenience link. All links likely to be used are only to .pdf files hosted there, with no editorializing at all. --Abd 00:03, 5 October 2010 (UTC)Reply

There seem to be quite a lot of history behind this, but I'm not inclined to dig through the archives at enwiki - be it arbcom or talkpages - to get to the bottom of it. Questions like whether or not procedure was followed when the link was first blacklisted at enwiki is strictly speaking irrelevant for the global blacklisting. In the discussion following the addition to the GBL, local whitelisting was suggested as a way to link particularly relevant pages from this site. I see that five pages have been whitelisted at enwiki, whereas none have been whitelisted at it, ja or dewiki - which are the other wikis where a number of links to this domain previously had been added. As long as editors from other projects than enwiki doesn't request delisting, I think this is better deferred to enwiki, where the whole domain could be whitelisted if they find there are reasons to do so. I'd rather not have a rerun of the previous debate here on Meta, which for a large part looked like a spin-off from previous discussions at enwiki, but it would be good if another Meta-admin had a look at this before closing though. Best regards, Finn Rindahl 22:43, 4 October 2010 (UTC)Reply

Thanks. There are two links above not to the full ArbComm case, but to the conclusions, two paragraphs. The isn't a claim for delisting because of "procedure violation." Rather, a farrago of charges were made in that former consideration, and the sheer number of charges and the reputation of the filer may have swayed the result, which doesn't make it improper, but could make it wrong. All of these arguments have been considered elsewhere, in detail, and rejected. I'm not seeing any reason for blacklisting here. However, I'll look at the former discussion and summarize the arguments Mike accepted and respond to them. Please defer final decision until I've done that, unless you can make it moot. Otherwise we are going to have more discussion at en.wiki, more editor and admin labor wasted, for nothing, on enwiki, and more editors frustrated on the other wikis. This site is highly regarded, as I pointed out, commonly cited in reliable sources, and it comes up top on searches for the hosted reliable sources, and so I'm sure editors are attempting to add links for convenience. Most of them will give up. This blacklisting is harming editors and failing to respect readers. --Abd 23:42, 4 October 2010 (UTC)Reply

Sigh.

I quote: "the original en.wiki admin requesting blacklisting was reprimanded by ArbComm for his en.wikipedia blacklisting[7][8])" .. The first link is a finding of fact, that is NOT a reprimand or even stating that he did something wrong there, just that he did something .. the second link links to a reprimand which is general, that saying 'reprimanded by ArbComm for his en.wikipedia blacklisting' is pure synthesis.

Now, lenr-canr.org hosts documents which are also available from the original sources. The copyright status of that is NOT clear, the only person who says he has permission is the site owner, and the permission is from the authors of the articles, it does not seem to be from the publisher site, and that is the site that should transfer such permissions, the authors don't have anything to say about it anymore. Moreover, even if only one article is editorialised, that shows that documents do get editorialised. WHY on earth do we have to link, even for convenience, to a site where one would need to check the original to see whether the copy in question is not editorialised. I advice against de-blacklisting, concerns have not changed since 2009, I would even advice against whitelisting the whole domain, documents can be evaluated (and have been evaluated) on a case-by-case basis, and whitelisting of single documents can be performed where needed. Note, that the documents that are whitelisted on en.wikipedia were evaluated as such, and many are not used, making whitelisting them redundant, and de-whitelisting has actually also been requested. --Dirk Beetstra T C (en: U, T) 10:44, 5 October 2010 (UTC)Reply

Well, Dirk, the main finding of that case was that the blacklist was not to be used for content control, and it is very clear from the arguments presented that content control was the purpose, as well as personal vendetta on the part of the requestor here. Suit yourself. Yes, the links aren't used. That's because there is a faction on Wikipedia that has consistently reverted them out, part of a pattern of long-term POV-pushing, using the same "copyvio" argument, and trashing your opinion that those particular links were okay. As they trashed the opinion of a consensus of editors, including admins, at w:Martin Fleischmann that the link there was appropriate, always arguing this or that technicality du jour. I reverted that back yesterday, but I have no confidence that it will remain, nor do I really care any more.
Suit yourself. I made the request, I've discharged my responsibility. There was no evidence sufficient to allow blacklisting, the guidelines all require serious abuse of links, blacklisting is supposed to be a last resort, but, I've seen it in more than one case, the real standard is "whatever we think today," and if our friend asks for blacklisting, why, sure, no problem. I will probably ask for a meta review of blacklisting practice, based on this case and lyrikline.org, another abusive blacklisting (though without the "fringe" overtones of this one). My concern is no longer Wikipedia, I'm abandoning editing there, for the most part. It's become completely hopeless, until major structural changes are made. Good luck with the mess you have created. Whatever WMF project editing I'll be doing will probably be on Wikiversity, where, indeed, lenr-canr.org is whitelisted in toto. I may also work on Wikibooks. --Abd 14:44, 5 October 2010 (UTC)Reply

But linking to copyright violations is a serious abuse. And even if it was blacklisted for the wrong reasons, we are NOT a bureaucracy, Abd. And please, don't bring lyrikline back, they were blatantly spamming their site, that blacklisting was more than necessary. They had their early warnings, they had their blocks ... Sigh, you really don't have a clue how much it pays to have your links on Wikipedia, do you? --Dirk Beetstra T C (en: U, T) 15:50, 5 October 2010 (UTC)Reply

Beetstra, I tried to work with you on en.wikipedia, and you were helpful at times. But you've never understood the basic problem that comes up with abusive blacklisting, and you run circular arguments. I'm not going to argue lyrikline here, but if I need to appeal this -- I'm hoping some neutral admin may look at this, in which case there is a good chance it will simply be done -- I will not confine the matter to lenr-canr.org, because lyrikline.org, though a different kind of case, was also a very abusive blacklisting. But since you've made some very incorrect claims: the user responded quickly to warnings. The actual links, every one that I've examined, were good. Good links were massively removed with no discussion or notice. I'd call that vandalism, actually, but apparently if you are working on spam, you can vandalize the project, you can do it cross-wiki as IP, and it's okay. This is out of balance. Absolutely, the addition was at a problematic pace. But what then? So was the removal, and it was much faster! If someone saw the removal and wanted to keep those links, they'd have found it impossible. I did find, later, some altered links where users did use a workaround to point to lyrikline, and there might be more of those. The user stopped when warned, there was time to review the situation, there was no emergency. When I saw what had happened to this user, I actually cried. This person was working hard to improve the wikipedias, and doing what was good work. And they were seriously abused, and obviously heartbroken. I know you will claim that they were involved with lyrikline.org. That makes no difference at all, whether true or not! Stopping them and then helping them to gain cooperation of other users, that would have been fine. But the spam "warriors" don't think like that. Spammers are the enemy. Talk about battleground mentality! See w:Wikipedia:WikiProject Spam. The image there is not of a broom! It's a big Boom! Have you ever seen what one of those shells does when it falls on some innocent person? Or a wedding party, for that matter? You know that it's possible to deal with spam without all that collateral damage, I made the proposals, and it would be even more efficient. And do you know who, most of all, stopped them, even though this would bring in additional labor? You did. You really don't want "interference" with what gives you power over the projects and editors. I'm going to go wash my face and hands. --Abd 17:46, 5 October 2010 (UTC)Reply
Abd, you have no clue .. but really. I am sorry. --Dirk Beetstra T C (en: U, T) 20:10, 5 October 2010 (UTC)Reply


Actually, I'm not quite done. I said I'd review Mike's reasons for supporting the blacklisting. Unfortunately, this will be long, because a pile of mud has been tossed, and to respond to that with evidence and analysis takes many words. I'd hoped that, at least, someone here would look at that prior discussion, which I linked to, perhaps some reasonable allegations would be made, justifying the blacklisting with spam evidence, as the guideline for the blacklist requires, and then I could have responded to those simply. Vain hope. This is my coverage of the reasons given for blacklisting originally. No new reasons have been given here, to my knowledge, other than ad-hominem arguments about me, which amount to gratuitous personal attacks.

JzG's original arguments

  • The behavior of Jed Rothwell on en.wikipedia. Irrelevant. JzG did not disclose his personal involvement with this editor. And this was irrelevant to a site blacklisting, unless the behavior was spamming. It was not. Period.
  • It has been spammed and promoted extensively by Jed Rothwell. False. This is a site where links are arguably legitimate, and the site is recommended as a source in many peer-reviewed and academic sources. Rothwell's "extensive" "spamming" was the addition of links to papers on his site, as suggestions in w:Talk:Cold fusion, totally legitimate, though Rothwell can be blunt and uncivil. Remember, this site is "promoted" in reliable source, I've linked it. That means recommended, folks. I recommend the site, if you want to find, in one place, the most extensive bibliography on cold fusion in existence, plus convenience copies of about a third of the papers listed, for quick access. The bibliography is neutral, by the way. He hosts all papers on the topic, positive and negative, if he can get permission. You can use the bibliography and the hosted papers without ever seeing "POV." Or you can read some material that is published there. JzG claimed that this site cannot be trusted. I've found the opposite. His bread is buttered by being accurate. Sure, he has a POV. Who doesn't?
  • Authors do not have the right to give permission. Misleading. And irrelevant, to boot. Most hosted papers are preprints, and, where I've checked, authors to have the right to give permission on those. Rothwell also asks for publisher permission, he's discussed this extensively, and it's a complex issue. It's irrelevant because we are not obligated to verify permission to link to a site, we must only avoid knowingly linking to a copyvio, which is as far as even the most extreme legal responsibility extends, nor is there any policy requiring verification of individual copyright status. This is a decision to be made, if it's going to be made, on each paper, and only if a site is mostly copyvio would there be a legal reason to avoid linking to it.
  • A page was "editorialized." Totally irrelevant. So don't link to that page, if there is something better, but such editorializing does not rule out the use of a convenience copy source if there is nothing better, as long as the reader is not likely to be misled. That page was a copy of a government review of cold fusion. Rothwell chose to write a brief introduction, it clearly stood out from the review text. Publishers do this all the time! Rothwell also credited his source, a skeptics' organization. Which also prepended text. And for a long time, the link was to the skeptic's organization. Finally, someone found a copy of the original report on the Internet Archive, which was undoubtedly superior. At the time JzG filed this request, I believe that the skeptic's copy was presented. JzG repeated this argument by mentioning "alteration" of papers, whereas, in fact, no alteration has been found, the example cited when he argued this before was the same page with an introduction. He libelled Jed Rothwell and lenr-canr.org, many times, a real possible legal issue, if Rothwell were so inclined (he's not), all the while claiming to be protecting Wikipedia from "copyvio."
  • Fringe POV problems. Irrelevant. By this time, evidence had accumulated that made the claim of "fringe" questionable, though still probably supportable. Consistently, though, JzG took an extreme position on this, based on a friend of his telling him that, at one point, the Wikipedia article was "decent," or something like that. He took that, apparently, to mean that cold fusion was completely bogus, extreme fringe, and that any source which might suggest otherwise, no matter how published, was "fringe." The description of Rothwell as an "infinite energy advocate" is completely bogus. And this was a content argument, like the rest.
  • The only argument legitimate here was alleged spamming. Was there spamming? That's what I'm asking to be examined anew. No examples have been given, just the massive usual set of references, and determining "spam" on a site requires a judgment that the links were not legitimate, a content argument, unless there is massive addition at a fairly high pace or overall volume. Spamming generally and properly refers to massive additions, not to arguably appropriate additions being added at a reasonable pace, such that normal editorial process can handle them. There were, at the time of JzG's original blacklisting, pages from lenr-canr.org in use at en.wikipedia. He removed them. He was clearly tired of taking them out and finding that other editors would put them back in as useful. So he used the blacklist to bypass editorial consensus, and when he saw that it had been noticed and was being challenged, he came here to cement it, knowing the process here and how difficult it would be to reverse a decision. As we are seeing today.
  • JzG reported that a "friend" on it.wikipedia requested a global blacklisting. So ... a content dispute on it.wikipedia led to this? Some editor there didn't want to deal with the other editors, wanted to trump it. This was, again, the abuse of the blacklist for content control. Nobody requested blacklisting on en.wikipedia, JzG just did it. Then filed a report after the fact. --Abd 17:23, 5 October 2010 (UTC)Reply

Mike.lifeguard's reasons

  • "Contrary to the claims, I do see evidence here of the domain being pushed inappropriately by the domain owner. Given the above issues (bias, reliability) and the persistence of those pushing the domains, careful monitoring of link additions and critical analysis of their inclusion is required. This applies beyond merely the English Wikipedia. For me, this inappropriate promotion of the domain is the central concern for a few reasons ...."

Mike proceeded to give content control arguments, and he wrote:

  • "The domain will remain blacklisted on Meta until such time as the issues identified here have been resolved."

What issues are there to be resolved? If the reason for the blacklisting was "inappropriate" pushing, whatever that is, blacklisting may not (and did not) prevent it. Mike points out that blacklisting will "force" editors to use better sources. That's a content argument, judging what is "better" and what is not. There is, for example, no better bibliography available than lenr-canr.org. Period. The Britz bibliography (which also points to lenr-canr.org) is only of peer-reviewed papers, he hosts no copies for convenience, only giving his own brief summary of each. Britz is what is currently linked from w:Cold fusion. There are maybe twice as many conference papers as academically published papers, showing the state of the research, many of which are cited in peer-reviewed articles, and which are only available at lenr-canr.org, by permission of the authors. These cannot be used as references, on their own, with a few exceptions (one of them was whitelisted and used for that on en.wikipedia), unless they show up being cited in secondary sources, which many of these have. And lenr-canr.org is often the only practical place to obtain these, and there is no possible copyvio issue for them (another red herring).

The claim that there was "inappropriate pushing" is not supported by an examination of the contributions. What you will see at the bottom of Rothwell IP posts is a signature, Jed Rothwell, Librarian, LENR-CANR.org That is not a link. It is simply disclosure. This wasn't at all inappropriate, in itself. The blacklist had no effect on this. Rothwell had not edited as his account JedRothwell since 2006, I didn't look at those. That account wasn't blocked at the time of this blacklisting. It was later blocked simply because a friend of JzG blocked it, with no reason given other than "glad to help out."

The accounts cited with the report were mostly very old, it was stale. The report was filed 8 January, 2009. Only one account was at all recent. These were what was presented as boilerplate evidence with the report:

enwiki:64.247.224.24 made one edit only in 2008, [9]. This was to add a link, indeed, but the site was already mentioned (and properly) in the article. If this was Rothwell (it's not clear, and JzG did definitely block other IP that wasn't Rothwell, claiming it was him), he was only being helpful. No promotion was involved, only convenience to the reader. The fact that the link was added to can be verified in many reliable sources, including the paper that just appeared this month in Naturwissenschaften, [10], you can see the lenr-canr.org link, it's prominent in the first-page display. There are global contributions for this account, including edits to it.wikipedia, but in 2007, a handful. Nothing that could justify this report.

enwiki:208.65.88.243 was the account active before JzG blacklisted. It edited over 5 days, November 27--December 1, 2009. The only edits were to three sections of [w;Talk:Cold fusion], and those sections as they stand at the end of this were: [11][12][13]. This was the only account cited in the JzG report that had recent contributions, and I see no other global contributions for the IP.

Jed Rothwell is an expert on the topic. He's a writer, and he commonly edits conference papers on cold fusion for publication (as by Tsinghua University Press and others, including mainstream journals). Much of the literature is in Japanese, and he's fluent in Japanese, and has translated much. He knows the literature well, and he is in the "Reliable Sources" topic linked, discussing the sources. When he mentions a paper -- these were very relevant mentions -- he links to it, if he can. Absolutely, this was appropriate discussion. He just might be the world's foremost expert on the specialized topic of cold fusion sources. There are others who know the science better. I can tell his frustration, and it often led him to dismiss editors as crackpots and cranks. I've concluded, myself, that he was sometimes correct. But, shhh.... don't tell anyone. In "Another try at the intro," he makes one post, and the only mention of lenr-canr.org is his characteristic non-link signature. In "Some edits for NPOV, MAINSTREAM," he writes "Skeptics have published only a dozen or so peer-reviewed papers. You can read most of them at LENR-CANR.org." Which is true and helpful.

(If you understand him correctly, he's correct about the papers. There are maybe a thousand papers published under peer review on cold fusion. Only a handful are "skeptical," which means blanket rejection. There are more papers that are skeptical in some respect, about this or that finding, and, indeed, many "positive" papers are skeptical in some respect or other. What Rothwell is pointing to are the original "rejection" papers that were considered by some to be conclusive, even though they actually verify some of the later research. I.e., if you do what those researchers did, you won't see anything either. Why am I bothering to explain this? Because if you don't know the issues, you might think he's "promoting," easily. "Inappropriate," however, is a content judgment. In reality, he was just responding to what had been written by a very, very skeptical editor, whose skepticism goes way beyond what is in reliable source.)

So what "inappropriate promotion" was Mike.lifeguard referring to? He didn't say. Was this "pushing" global? I can't tell from the links in the original filing, but the global blacklist should not be used for local spamming, see Spam blacklist/About. There wasn't local spamming on en.wiki. Was there inappropriate addition to global sites? If it was like en.wikipedia, probably not, though certainly some usages may have been inappropriate, due to various content issues. This is not a task for the blacklist.

Mike.lifeguard made a content judgment to justify the blacklisting. The evidence in the report did not justify blacklisting, so there is no "original problem" to be fixed. JzG knew he'd fail to keep his abusive, out-of-process, stealth blacklisting at en.wikipedia -- Beetstra, I can't imagine how you'd think that the ArbComm decision did not find that blacklisting to be abusive --, so he came here and presented a farrago of arguments, a barrage of mud, and he knew that some of it would likely stick. But this isn't about JzG or Mike.lifeguard, it is about lenr-canr.org. There is not and has never been a reason to spam-blacklist this site.

Most editors, seeing the blacklist refusal of their edit, will not follow up, no matter how good the link might be. I can say that at en.wikipedia, if they do request it, they will be met by a contingent of the handful who watch those pages (people who like to blacklist! -- or, to be fair, who are involved with spam enforcement, which tends to leave editors jaded and suspicious, it's well known), who will very likely refuse even reasonable requests, it's a crap shoot. They will demand the strongest possible evidence for the usage, which, of course, is content argument, and, in particular, it is administrators making content judgments, instead of those editing the article, violating basic wiki principles (content standards are not universal, an appropriate source for one kind of article may not be appropriate for another). I've seen an editor blocked because he requrested delisting or whitelisting. (He'd done some inappropriate things, for sure, but what ever happened to warnings? He clearly was acting in good faith.)

Lenr-canr.org is a poster boy for an abusive global blacklisting. I'm interested to see what happens. I don't care about en.wikipedia any more, but I do care about other WMF projects and, in particular, meta. Since I've seen problematic decisions at meta before, I might be interested in finding out how to appeal from a blacklisting decision. Bad decisions at meta can cause widespread damage, and most users just toss their hands up in despair. But, please, just look at the reasons for blacklisting lenr-canr.org and see if they match the guideline for the usage of the blacklist, and let the chips fall where they may.

One more point. JzG has long claimed I am "proxying" for Jed Rothwell. The opposite is true. Rothwell wants the Wikipedia article to be as bad and useless as possible. He doesn't need Wikipedia for page ranking, he comes up top (below Wikipedia) -- or first page -- on relevant searches anyway. He's long told me I'm wasting my time. I'm coming to think he's correct. --Abd 17:23, 5 October 2010 (UTC)Reply

Troubleshooting and problems

This section is for comments related to problems with the blacklist (such as incorrect syntax or entries not being blocked), or problems saving a page because of a blacklisted link. This is not the section to request that an entry be unlisted (see Proposed removals above).

Exeption for backupurl



I need a Backup for a UNHCR-document at http://www.unhcr.org/3b9cc1144.pdf#page=234 WebCite does not work, but backupurl.com/site.php?key=qtr83b is a workaround. If you save the information from backupurl.com with suffix .pdf on your harddisk, you can open the pdf-documant properly. Is there a chance for an exeption? Or does somebody know a better workaround? --Diskriminierung 17:18, 30 June 2010 (UTC)Reply

I don't understand what you're trying to do. The UNHCR website isn't disappearing any time soon, I would think, so why do you need a backup?  – mike@meta:~$  16:32, 4 July 2010 (UTC)Reply

It is my fashion always to backup urls. But it doesn't matter anymore. I found a backup at googlebooks (which I used) but furthermore it is easy to circumvent this blacklist-entry (what I dind'd use here). Post me to know how, because I will not post it here officially. Here eod. --Diskriminierung 09:42, 6 July 2010 (UTC)Reply

O.K. You want it here: It is simple as that: archive a link which is not to be archived with WebCite with backupurl. You get a backupurl-Link. You can archive this backupurl-Link with WebCite in spite of the fact that WebCite would never archive the original link because of a interdiction in robots.txt. If then you put the WebCite-Link, which is a cloaked backupurl-link, as reference into Wikipedia the spam blacklist will not realise it. I suppose WebCite should put backupurl on its own blacklist. --Diskriminierung 14:04, 6 July 2010 (UTC)Reply

So if I understand it correctly, webcite can be used as a redirect for blacklist circumvention?



This is interesting. --Dirk Beetstra T C (en: U, T) 14:20, 6 July 2010 (UTC)Reply

Yes, of course archive services (like WebCite or the Internet Archive) can be used as a workaround to the URL blacklist if you're only interested in pushing content rather than trying to get hits on your website. Blacklisting webcitation.org would break a lot of content on English Wikipedia, though, so I hope that doesn't need to be considered. Gavia immer 17:55, 6 July 2010 (UTC)Reply
since WebCite pays attention to robots.txt it is possible that they use a blacklist, too. It would be better to inform them first and only after failure of that method to blacklist them here. --Diskriminierung 09:17, 7 July 2010 (UTC)Reply

For these two services, perhaps it would make more sense for local projects to exempt them via MediaWiki:Spam-whitelist rather than simply remove them from the SBL, where the spam potential on smaller projects could go unchecked? Kylu 02:41, 13 September 2010 (UTC)Reply

way.com

The regex "\bway\.com\b" seems to be affecting all legitimate sites ending with "-way.com", for example http://www.german-way.com and http://www.con-way.com DHN 00:12, 12 August 2010 (UTC)Reply

Hi!
Yes, this is right. "\b" just means word-boundary. If only the domain way.com should be blocked try
(?<=\/\/|\.)way\.com
-- seth 22:52, 22 August 2010 (UTC)Reply
Is this still a problem? --dferg ☎ talk 08:20, 8 September 2010 (UTC)Reply
Hello, this is still a problem. Trying to make an edit to a Con-way entry and cannot add a link to con-way.com Tcy3421 09:06, 24 September 2010 (UTC)Reply

Discussion

This section is for discussion of Spam blacklist issues among other users.

Crosspost from en.wikipedia

en:User:Barek noticed duplicates on the blacklist here, and reported them here. I am posting here the list of duplicates. Maybe we need a cleanup?

List of entries listed multiple times at spam blacklist
  • \bbootsluxury\.com\b
  • \bchristianlouboutinmy\.com\b
  • \bhandbagcom\.com\b
  • \bchristianlouboutinshoessale\.com\b
  • \bherve-leger\.com\b
  • \bhervelegerweb\.com\b
  • \bmbt-shoes-discount\.com\b
  • \bvertuexclusiveshop\.com\b
  • \bvibram-five-finger\.com\b
  • \bvibram-fivefingerss\.com\b
  • \b100bhshoe\.com\b
  • \b102bhshoe\.com\b
  • \b104allbyer\.com\b
  • \b106fashion4biz\.com\b
  • \b108akshoe\.com\b
  • \b109elife\.com\b
  • \b110maidi2008\.com\b
  • \b112batsale\.com\b
  • \b114batsale\.com\b
  • \b116kicksquality\.com\b
  • \b118onseeking\.com\b
  • \b120e2to\.com\b
  • \b122luxuryeasy\.com\b
  • \b124green2style\.com\b
  • \b1268000trade\.com\b
  • \b128chicmalls\.com\b
  • \b129elife\.com\b
  • \b130e4cn\.com\b
  • \b132wanderfulshopping\.com\b
  • \b134bbbshoe\.com\b
  • \b136salesuper\.com\b
  • \b138takeofdream\.com\b
  • \b140newflybuy\.com\b
  • \b142newflybuy\.com\b
  • \b144mesoso\.com\b
  • \b146steezecloth\.com\b
  • \b14wowhotsale\.com\b
  • \b16wowhotsale\.com\b
  • \b18ecartshopping\.biz\b
  • \b20uspopularbiz\.com\b
  • \b22etradinglife\.com\b
  • \b24vipshops\.org\b
  • \b26wowcool\.org\b
  • \b28plzzshop\.com\b
  • \b2fashion-long-4biz\.com\b
  • \b30plzzshop\.com\b
  • \b32goladymall\.com\b
  • \b34coolforsale\.com\b
  • \b36overstockes\.com\b
  • \b38sbbshoe\.com\b
  • \b40vipmalls\.com\b
  • \b42vipmalls\.com\b
  • \b44netetrader\.com\b
  • \b46tqshoes\.com\b
  • \b48tntshoes\.com\b
  • \b4fashion-long-4biz\.com\b
  • \b4uaf\.com\b
  • \b50kogogo\.com\b
  • \b52kogogo\.com\b
  • \b54shopperstrade\.com\b
  • \b56goflywire\.com\b
  • \b58fashion-sell\.com\b
  • \b60shoppingtime\.us\b
  • \b62shoppingtime\.us\b
  • \b64iseeshoe\.com\b
  • \b66foruping\.com\b
  • \b68muyuo\.com\b
  • \b6elivestyle\.com\b
  • \b70seekjersey\.com\b
  • \b72bccloth\.com\b
  • \b74domchisport\.com\b
  • \b76domchisport\.com\b
  • \b78ebuyings\.com\b
  • \b80ebuyings\.com\b
  • \b82elivebuy\.com\b
  • \b84stefsclothes\.com\b
  • \b86itemtolive\.com\b
  • \b88itemtolive\.com\b
  • \b8cheapmaket\.com\b
  • \b90ccshoper\.com\b
  • \b92etootoo\.com\b
  • \b94streetcandy\.org\b
  • \b96minewear\.com\b
  • \b98myyshop\.com\b
  • \baj2u\.com\b
  • \ballspymonitor\.com\b
  • \bbalmainboots\.com\b
  • \bbestsales4u\.com\b
  • \bbestvibram\.com\b
  • \bbootsshop2010\.com\b
  • \bbuyvertureplica\.com\b
  • \bcheap-air-jordan\.cn\b
  • \bchesssoul\.com\b
  • \bchristian4sale\.com\b
  • \bchristianlouboutinmall\.com\b
  • \bchristian-louboutin-sandals\.com\b
  • \bchristianlouboutinshoestore\.com\b
  • \bcircuitocerradotelevision\.com\b
  • \bdensitygs\.com\b
  • \bdensitygs\.info\b
  • \bdunk2u\.com\b
  • \becwarmboots\.com\b
  • \bedhardybazar\.co\.uk\b
  • \be-lv\.net\b
  • \bemoncler\.com\b
  • \beshoppingluxury\.com\b
  • \bfivefingervibram\.com\b
  • \bgetsnet\.com\b
  • \bgodswmobile\.com\b
  • \bgouggs\.com\b
  • \bhardingsoft\.com\b
  • \bhervelegernet\.com\b
  • \bhervelegersale\.com\b
  • \bhiebay\.com\b
  • \bhoteldeals\.ae\b
  • \bidevlite\.com\b
  • \bjimmychoocom\.com\b
  • \bjordandi\.com\b
  • \bkissuggboots\.com\b
  • \bkitdetox\.com\b
  • \blinksoflondonstore\.com\b
  • \blouboutinsales\.net\b
  • \bmanoloblahnikcom\.com\b
  • \bmax-sky\.com\b
  • \bmbtforcheap\.com\b
  • \bmenorca-airport\.com\b
  • \bmonclercom\.com\b
  • \bmonclerjacketstock\.com\b
  • \bmylouboutinstore\.com\b
  • \bnbajs\.com\b
  • \bnewgoing\.com\b
  • \bnikempire\.com\b
  • \bnike-star-shoes\.com\b
  • \bourlouisvuitton\.com\b
  • \bphoneworth\.com\b
  • \bpiketrade\.com\b
  • \bqqtwo\.com\b
  • \breplicaestore\.us\b
  • \bsellvibram\.com\b
  • \bshoes\.vc\b
  • \bshoppingherveleger\.com\b
  • \bsilver-tiffany\.com\b
  • \bsoftwarewikipedia\.com\b
  • \bsouthfloridatelecom\.com\b
  • \bsupplyedhardy\.com\b
  • \bsweatboots\.com\b
  • \btiffanyhot\.com\b
  • \btiffanyou\.com\b
  • \btn4bags\.com\b
  • \btobuybattery\.com\b
  • \btopvibram\.com\b
  • \btopvibramfivefingers\.com\b
  • \btotalscreenrecorder\.com\b
  • \bugg2u\.net\b
  • \buggsky\.co\.uk\b
  • \bup2ugg\.com\b
  • \bvibramfive-fingers\.com\b
  • \bvibramfivefingersweb\.com\b
  • \bvibramstore\.com\b
  • \bvibramweb\.com\b
  • \bvipwomenshop\.com\b

(sorry, I was too lazy to remove the triple indent). --Dirk Beetstra T C (en: U, T) 07:34, 18 August 2010 (UTC)Reply

Yes, we need a cleanup. Also what worries me is the size of the blacklist. It's getting unusable since is becoming a way large page IMHO. Any solution? Regards, --dferg ☎ talk 08:22, 28 August 2010 (UTC)Reply
The problem is that there are so many other wikis out there that depend on the spam blacklist for spam blocking. Our spam prevention mechanism needs a complete overhaul anyway. The easiest way out is to have multiple blacklists, other suggestions include bug 4459. I would like to have a dedicated wiki for anti-spam work. Any solution requires developer action.
Unfortunately, the Chinese knockoff spammers will never go away so the solution must be scalable to at least 10x the number of domains we have now. MER-C 08:53, 28 August 2010 (UTC)Reply
I think I know the person who can run a cleanup script to remove those duplicates. As for the improvement of our spam protection systems I'll comment later. Thanks, --dferg ☎ talk 19:48, 6 September 2010 (UTC)Reply
Hi!
As I said in the original discussion at w:en, I semi-manually started deleting some redundant entries a few days ago. Until now I'm not finished with that, but will continue later (in a few days).
I wrote some tools in 2009 which search SBLs and SWLs for double entries and deletes most of them. But this will reduce the sbl's size only by a few %. Grouping of similar entries (what I normally do by hand and vim) brings another few %. Btw: For grouping it would be nice to "sort -u" the list, but sorting has the (small?) disadvantage that thematically grouped entries would be discerped. However, if you want me to do that, I can sort the entries, delete redundant entries and group many of them, so that the list will shrink a bit. I guess it will lose about 5-15% of Bytes.
So the size problem actually will remain. Apart from that I'm not sure, whether the performance of the source code could be a problem in future.
I guess, it could be an advantage to have two lists: one big raw-text list and one smaller regexp list. The raw-text list could be partitioned to several pages, similar to our sbl log, e.g. grouped by letters. The regexp-list should be like our present sbl, but much smaller. Most of the blacklist entries don't need regexps. The advantages of this distribution would be: 1. faster source code like
use regex list; if link not blacklisted there: if domain starts with "a" use a-list, elseif domain starts with "b" use b-list, a.s.o., else use special char list;
2. smaller subpages, so that those pages are more comfortable to edit and easier to read.
Additional to that the sbl could test every entry addition whether it is redundant and could give a warning in that case.
If anybody builds a html/php-framework, I could help with some algorithms concerning regexps (searching for entries,removing double entries, ...). But I don't have the time to do that all by myself. -- seth 22:11, 6 September 2010 (UTC)Reply
Cleanup done - thanks seth! - I've logged the removals. Regards, --dferg ☎ talk 08:18, 8 September 2010 (UTC)Reply
Hi!
Well, cleanup is not yet done. I just deleted the exact/literal dublicate entries, but not really all redundant entries ("foobar" is redundant, if "foo" is already blacklisted). I'll do that maybe next week.
However, the main problem (large size of the sbl) is not yet solved. -- seth 09:08, 8 September 2010 (UTC)Reply

┌───────────────────┘
Vielen Dank! - I've just logged the removals of the duplicates found by Barek (if you've removed other domains too, feel free to log them too)

The cleanup for redundant entries looks OK too. However you are correct that the main issue, the size of this list, is not resolved. I would easyly think in the extension working with subpages like Spam blacklist, Spam blacklist/2010, Spam blacklist/2011 or something like that. An special page for directly adding on the database would be awesome too but we can not forget that this list is being used outside WMF too. Regards, --dferg ☎ talk 18:32, 8 September 2010 (UTC)Reply

Archives

Please do not forget to archive requests after processing them. Otherwise this page gets unusable because of the size, etc. I've done a cleanup today, where processed requests from July were still here. Thanks and regards, --dferg ☎ talk 18:19, 5 September 2010 (UTC)Reply

Works for you?

I've recently tryied to blacklist some domains using SBHandler. When I clicked on "edit changes" my computer frozen and I had to restart firefox. Evertytime I want to do that happens the same thing. Does it happen to you too? Thanks for your comments. --dferg ☎ talk 11:54, 9 September 2010 (UTC)Reply

Bump, anyone? --dferg ☎ talk 13:52, 4 October 2010 (UTC)Reply
Works fine for me, I'm using firefox as well. When excactly in the process does your computer freeze? Finn Rindahl 14:02, 4 October 2010 (UTC)Reply
When I click on "Edit changes", it starts loading & then firefox crashes fatally. --dferg ☎ talk 18:50, 4 October 2010 (UTC)Reply
I've had something like this blacklisting that friggin' Knockoff spam that MER-C is working on (there were two big lists at that time). I think then it had something to do with the size of the page. --Dirk Beetstra T C (en: U, T) 14:55, 4 October 2010 (UTC)Reply
Isn't that a bit strange? The size of the talk page shouldn't affect editing the page itself (which is what we do with "edit changes"), and even if those lists are long that doesn't significantly influence the size of the page (the Spam blacklist)?Finn Rindahl 15:31, 4 October 2010 (UTC)Reply
I think it is that the script has a time-out on saving the talk .. --Dirk Beetstra T C (en: U, T) 10:48, 5 October 2010 (UTC)Reply