Talk:Spam blacklist

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Requests and proposals Spam blacklist Archives (current)→
The associated page is used by the MediaWiki Spam Blacklist extension, and lists regular expressions which cannot be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any meta administrator can edit the spam blacklist; either manually or with SBHandler. For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.
Proposed additions
Please provide evidence of spamming on several wikis and prior blacklisting on at least one. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Please check back after submitting your report, there could be questions regarding your request.
Proposed removals
Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.
Other discussion
Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
#wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.

Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.


Completed requests are marked as {{added}}/{{removed}} or {{declined}}, and are generally archived quickly. Additions and removals are logged · current log 2014/12.

Projects
Information
[[Special:MyLanguage/Complete list of Wikimedia

projects|List of all projects]]

[[Special:MyLanguage/Wikimedia

projects|Overviews]]

Reports
[[Special:MyLanguage/Wikimedia

Embassy|Wikimedia Embassy]]

Project portals
[[Special:MyLanguage/Country

portals|Country portals]]

Tools
Spam blacklist
Title blacklist
[[Special:MyLanguage/Vandalism

reports|Vandalism reports]]

[[Special:MyLanguage/Proposals for closing

projects|Closure of wikis]]

Interwiki map
Requests
[[Special:MyLanguage/Requests for

permissions|Permissions]]

[[Special:MyLanguage/Requests for bot

status|Bot flags]]

[[Special:MyLanguage/Requests for

logos|Logos]]

[[Special:MyLanguage/Requests for new

languages|New languages]]

[[Special:MyLanguage/Proposals for new

projects|New projects]]

[[Special:MyLanguage/Requests for username

changes|Username changes]]

[[Special:MyLanguage/Steward

requests/Usurpation|Usurpation request]]

[[Special:MyLanguage/Translation

requests|Translations]]

[[Special:MyLanguage/Multilingual speedy

deletions|Speedy deletions]]

[edit]

snippet for logging
{{sbl-log|10804646#{{subst:anchorencode:SectionNameHere}}}}


Proposed additions[edit]

Symbol comment vote.svg This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived.

prolevelweightloss.com[edit]










In addition to the IPs listed above, see [3]. Pure spam, no use on WMF sites. MER-C (talk) 12:31, 21 December 2014 (UTC)

Added Added — Revi 13:06, 21 December 2014 (UTC)

Proposed additions (Bot reported)[edit]

Symbol comment vote.svg This section is for domains which have been added to multiple wikis as observed by a bot.

These are automated reports, please check the records and the link thoroughly, it may report good links! For some more info, see Spam blacklist/Help#COIBot_reports. Reports will automatically be archived by the bot when they get stale (less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot).

Sysops
  • If the report contains links to less than 5 wikis, then only add it when it is really spam
  • Otherwise just revert the link-additions, and close the report; closed reports will be reopened when spamming continues
  • To close a report, change the LinkStatus template to closed ({{LinkStatus|closed}})
  • Please place any notes in the discussion section below the HTML comment

COIBot[edit]

The LinkWatchers report domains meeting the following criteria:

  • When a user mainly adds this link, and the link has not been used too much, and this user adds the link to more than 2 wikis
  • When a user mainly adds links on one server, and links on the server have not been used too much, and this user adds the links to more than 2 wikis
  • If ALL links are added by IPs, and the link is added to more than 1 wiki
  • If a small range of IPs have a preference for this link (but it may also have been added by other users), and the link is added to more than 1 wiki.
COIBot's currently open XWiki reports
List Last update By Site IP R Last user Last link addition User Link User - Link User - Link - Wikis Link - Wikis
arbabehekmat.org 2014-12-22 09:09:43 COIBot 216.158.77.102 Baran92 2014-12-22 08:46:16 181 151 144 0 2
asuszenfoneblog.com 2014-12-22 05:50:31 COIBot 216.239.38.21 Ariagussuwanto 2014-12-22 05:43:27 12 10 10 0 3
avkom.com 2014-12-22 09:24:10 COIBot 5.10.80.84 Hasanb34
78.188.179.41
88.248.141.22
2014-12-22 09:05:11 21 2
egyptsites.co.uk 2014-12-22 05:28:07 COIBot 185.26.230.131 Atn20112222
Belizarie
Fixer88
Hiyotchi
Mn-imhotep
北æ¢
田
94.65.184.89
2014-12-21 11:48:24 18 9
hhblife.com 2014-12-21 17:31:57 COIBot 66.147.244.52 Voyad M 2014-12-21 14:05:01 19 20 19 0 20
jaguaresdecordoba.com.co 2014-12-21 19:54:08 COIBot 66.147.242.95 Joluar7 2014-12-21 17:29:23 25 19 19 0 6
janchetnabooks.org 2014-12-22 07:19:56 COIBot 192.169.53.234 122.177.160.158 2014-12-22 06:58:33 5 5 5 2 2
lenouveaurecueil.fr 2014-12-22 02:25:40 COIBot 62.210.16.13 82.123.176.254
82.123.49.72
82.123.49.90
83.194.185.112
86.197.125.113
86.213.228.78
2014-12-21 13:52:56 14 2
morozov.com.ua 2014-12-21 19:46:56 COIBot 89.188.104.7 Sergienkod 2014-12-21 19:46:49 4 57 1 0 12
mumag.de 2014-12-22 02:59:53 COIBot 217.6.184.207 Mumag 2014-12-21 21:24:37 16 15 15 0 5
nationale7.me 2014-12-22 02:03:02 COIBot 66.155.11.238 Halphen 2014-12-21 23:03:12 6 6 6 0 2
pegida.de 2014-12-22 03:09:47 COIBot 81.169.145.163 WvdC 2014-12-22 09:16:17 4 15 1 0 5
profitacademybonus.com 2014-12-22 09:29:39 COIBot 192.232.251.32 JenelVane85
Vanemula58
2014-12-22 09:28:41 6 3
reisenexclusiv.com 2014-12-22 09:28:21 COIBot 188.93.11.127 Frank230403 2014-12-22 09:20:50 65 52 51 0 2
rigiflex.com 2014-12-22 08:43:03 COIBot 192.220.74.186 115.111.33.137
115.111.33.141
115.111.33.153
2014-12-22 08:29:47 17 2
ryogame.com 2014-12-22 03:04:44 COIBot 104.28.30.7 105.156.147.194
105.158.117.78
41.140.107.60
2014-12-21 15:19:12 14 2
salex-lcc.com.ua 2014-12-22 02:30:11 COIBot 91.206.200.251 212.90.61.16
91.209.51.116
93.74.48.67
93.75.11.211
2014-12-21 13:36:22 26 2
saturnonotizie.it 2014-12-21 17:58:42 COIBot X 77.42.124.138
79.43.120.145
2014-12-21 17:48:46 5 2
tiarasandtrianon.files.wordpress.com 2014-12-22 06:14:57 COIBot 192.0.72.3 82.132.232.30
82.132.244.35
82.48.227.112
2014-12-21 20:56:10 8 2

Proposed removals[edit]

Symbol comment vote.svg This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.

Remember to provide the specific domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as {{removed}} or {{declined}} and archived.

See also /recurring requests for repeatedly proposed (and refused) removals.

Notes:

  • The addition or removal of a domain from the blacklist is not a vote; please do not bold the first words in statements.
  • This page is for the removal of domains from the global blacklist, not for removal of domains from the blacklists of individual wikis. For those requests please take your discussion to the pertinent wiki, where such requests would be made at Mediawiki talk:Spam-blacklist at that wiki. Search spamlists — remember to enter any relevant language code

cais-soas.com[edit]



I want to refer to the following link: www.cais-soas.com/News/2001/October2001/22-10.htm

This is for the page en:David_Neil_MacKenzie (the current link to the obituary is broken). I don't see the reason why this domain is blocked, as it seems to be an academic source of information. בוקי סריקי (talk) 09:04, 26 November 2014 (UTC)

It seems to have been blocked following a request from enWP. @Dominic: do you have an opinion about the domain now with time having passed?  — billinghurst sDrewth 11:24, 26 November 2014 (UTC)

This was blacklisted due to spamming in combination with copyright infringement reasons. The situation may have changed, but Talk:Spam_blacklist/Archives/2010-02#cais-soas.com <- this discussion from 2010 sums it up quite well. At that time, it was deemed of about the same quality as Wikipedia itself, it was not an academic source of information, their inclusion standards were far below what we would need for a reliable source (and that there was spamming involved in the additions only strengthens that conclusion).

Unless the situation on the site has drastically changed, I would leave it on the blacklist, and request whitelisting for the few really needed links, like this one. --Dirk Beetstra T C (en: U, T) 03:32, 27 November 2014 (UTC)

Two questions:

  1. Is this a copy of the original, or an independent report of the same info? If the former, is it properly attributed?
  2. Is a copy of the original available from one of the archiving sites?

Thanks. --Dirk Beetstra T C (en: U, T) 06:18, 27 November 2014 (UTC)

syriadirect.org[edit]



This site offers good independent information about the Syrian civil war and it doesn't contain spam or anything like that. Honestly, I don't even know why it was put on the list in the first place but I would like to see it on the green so people can use it as reference in their articles about this conflict.

Seems to be collateral damage for a regex response to spam. We can probably do a lookbehind regex fix for this.  — billinghurst sDrewth 09:01, 18 December 2014 (UTC)

netzsch-thermal-analysis.com[edit]







Blocked as a false-positive by this [4] 2011 entry. de:User:Julia Kelbler tried to reinsert the link in the article de:Netzsch-Gruppe after she had accidentally removed it, see her section on dewiki's request-a-sysop page. (I think spamming of thermal-analysis.com might have stopped in the meantime.) Thanks, --MBq (talk) 12:19, 17 December 2014 (UTC)

Maybe you can add domain to local whitelist? — Revi 12:27, 17 December 2014 (UTC)
OK, I'll try this, thank you --MBq (talk) 13:29, 17 December 2014 (UTC)
This is indeed something that the local whitelist should solve, add '\bnetzsch-thermal-analysis\.com\b' to that list. The spamming of the links that precipitated the blacklisting obviously stopped, the sites are blacklisted. --Dirk Beetstra T C (en: U, T) 04:55, 18 December 2014 (UTC)
As it is collateral damage, it is again something that we should look to amend the regex statement, and I would think pretty easily.  — billinghurst sDrewth 09:03, 18 December 2014 (UTC)
The problem with that is that regexes then may become really complex, and some of the exclusions to the blacklist rule are very specific on only a very few wikis anyway (I doubt that the subject here will go much further than one or two pages on de and maybe en-wiki). If it would be one specific one which is of general use on many pages on many wikis the situation is different. What we really need would be a global whitelist, which I think is also an easy solution (and adaptations to the blacklist-system in the software have been requested eons ago, and only recently one of the requested features is patched into the current software). --Dirk Beetstra T C (en: U, T) 14:39, 19 December 2014 (UTC)
there's a non-complicated solution for this kind of problem: adding (?<=//|\.) in front of the regexp. I'll do that now.
Yes check.svg Done -- seth (talk) 16:21, 20 December 2014 (UTC)
That allows 'http://spammy-thermal-analysis.com', similar to the link it is supposed to block. I am going to undo, please follow the whitelisting as suggested above, or, if it is really of wider use, exclude only the 'netzsch-'-prefix. --Dirk Beetstra T C (en: U, T) 18:55, 20 December 2014 (UTC)
I am nowiki-ing the demonstration in my previous edit to this page. --Dirk Beetstra T C (en: U, T) 18:57, 20 December 2014 (UTC)
Hi!
The reason for blacklisting was the single domain thermal-analysis.com, see [5]. Is there any evidence for spamming with another domain that contains that string? There's no reason for local whitelisting, if we can just make the blacklist entry more precise. -- seth (talk) 19:45, 20 December 2014 (UTC)
The one you want to allow is one domain which is very likely only going to be used on one (maaaaybe two) wiki(s) on one article. That is exactly why local wikis have a whitelist. I've been around in spam-fighting on Wikipedia for a very, very long time and I have seen what spammers do to get domains into our pages and making holes into the blacklist is sometimes just what they are waiting for.
Finding evidence is going to be either a difficult and lengthy task (digging through blacklist hits which we only have since short and are unsearchable), or waiting for damage to occur. If you want some extension to the evidence, the company that owns 'thermal-analysis.com' does own 'simultaneous-thermal-analysis.com' as well, a domain that would be allowed after your rule chance. Also, global 'whitelisting through making a hole in a blacklist' would allow the company that was, for one single wiki, allowed through the hole for notability reasons, to spam on other wikis (even though the use on that one Wiki is legitimate, a company can be notable for an article on one wiki and not on the rest of the almost 800). After all, if they notice they are notable on am.wikipedia, they are also notable enough for all other wikis, right? That is again something for the local whitelist to a) not allow it cross-wiki, and b) to allow to be specific enough to open only even one page on the site for linking like an index.htm, an about-page or a specific document so that the possibility to start spamming on that specific wiki is limited. --Dirk Beetstra T C (en: U, T) 03:32, 21 December 2014 (UTC)
Hi!
In general:
Bigger damage than temporal spamming are false positives, i.e. "unguilty" domains that are blocked in several hundred wikis. Useful (large) edits can be lost by those false positves, because users (especially newbies) don't know what to do, if they can't save a page, or users don't even realize, that their change was not saved. Viewing the logs (of sbl and the edit filter) indicates that this happens quite often.
That's why we should only block explicitely those domains that have been spammed, at the latest if we get to know of false positives. Of course it's reasonable to put general (implicite) entries like "buycheap" (even without word boudaries '\b') on the sbl. There it's the better way to formulate the exceptions (or whitelist entries) explicitely.
If only one explicit domain is used for spamming and not a group of similar domains, then it's more reasonable to block just this domain, if we get to know that there are other domains that are useful, not spammed, but blocked.
In this specific case:
Anyway, now you mentioned the additional domain simultaneous-thermal-analysis.com, such that we now know three domains containing the string "thermal-analysis.com" and one of them is no spam. So in this case I've to admit that it probably doesn't matter much, whether we use
(?<=//|\.|simultaneous-)thermal-analysis\.com\b or
(?<!netzsch-)\bthermal-analysis\.com\b
As you, Beetstra, preferred the latter, I'll change the entry to this now. -- seth (talk) 10:45, 21 December 2014 (UTC)
I did indeed suggest that as a solution, with the caveat that now rules will become more and more complex as a result of this, and when editors do get blocked they will not understand anything anymore, nor be able to find what rule actually blocked their edit. And I still think that you do not have consensus for this edit. --Dirk Beetstra T C (en: U, T) 13:30, 21 December 2014 (UTC)
I thought this new regexp is consensual enough, because I just put your suggestion into practise. The sbl entries are unreadable for most of the people anyway (for example the entry "\beepurl\.com" won't block "beepurl.com"). The sbl is a blackbox for most of the users. They just can write their unblocking requests. It's the admin's task to decide and to find/add/change the right regexp entry. And the search tool helps them in finding the right entry and corresponding discussions. -- seth (talk) 23:24, 21 December 2014 (UTC)

xf.cz[edit]



Apparently added 2006-05-07 by Amgine with rationale "lots of URLs". I stumbled over it on c:Commons:YouTube_files#Chrome, and mentioned the issue on c:MediaWiki_talk:Spam-whitelist: "Insufficient documentation", and digging through thousands of entries in the edit history to identify the addition is excessively annoying. –Be..anyone (talk) 08:16, 18 December 2014 (UTC)

It looks to me like the storage part of a free webhost (so your domain becomes '<yourdomain>.xf.cz' .. likely that was part of the reason it was blacklisted - someone created random domains and spammed them (it is too long ago to see). Question is: how much of use is this in content namespaces of Wikipedia - this looks like an isolated case on a talkpage for now.
On the other hand .. this is 8 years ago .. maybe take the leap of faith? --Dirk Beetstra T C (en: U, T) 14:47, 19 December 2014 (UTC)
The various links above unsurprisingly find nothing, after eight years of blocking spammers of course won't try this anymore. I can't tell how "spammy" the domain is today (a comparison with SURBL could help, if SURBL still exists), I can't tell why it was added in the first place, but I'm rather confident that blacklist entries seriously older than 8 weeks (not years) are crap for e-mail spam fighting purposes. Manually edited list is antique, and no identifiable reason is bogus. –Be..anyone (talk) 23:25, 19 December 2014 (UTC)
On en.wikipedia there are several companies that are still actively promoting their business 5 years after spamming - but for this site, where it could very well be that it is not the site owner itself that was spamming but someone abusing the free webhost (what I find on the server certainly looks like a free webhost), the actual spammer may have only be deterred by this and moved on. That is why I am considering that this is 8 years ago and hence could be moot (it may even have been added after a local situation in the time before local blacklists existed). Note that this list is not for 'e-mail spam fighting purposes', it is for blocking promotional edits containing working external links - another version of spam you are referring to (and as I said, 8 weeks is nothing, we have editors coming back days after their rules are removed and spamming again). I am going to give this a try, it can always be put back if there is a problem with the additions. --Dirk Beetstra T C (en: U, T) 11:05, 20 December 2014 (UTC)
Removed Removed Dirk Beetstra T C (en: U, T) 11:05, 20 December 2014 (UTC)

Troubleshooting and problems[edit]

Symbol comment vote.svg This section is for comments related to problems with the blacklist (such as incorrect syntax or entries not being blocked), or problems saving a page because of a blacklisted link. This is not the section to request that an entry be unlisted (see Proposed removals above).

SBHandler broken[edit]

SBHandler seems to be broken - both Glaisher and I had problems that it stops after the closing of the thread on this page, but before the actual blacklisting. Do we have someone knowledgeable who can look into why this does not work? --Dirk Beetstra T C (en: U, T) 04:08, 30 April 2014 (UTC)

User:Erwin - pinging you as the developer. --Dirk Beetstra T C (en: U, T) 04:16, 30 April 2014 (UTC)

FYI when you created this section with the name "SBHandler", you prevented SBHandler from being loaded at all (see MediaWiki:Gadget-SBHandler.js "Guard against double inclusions"). Of course, changing the heading won't fix the original issue you mentioned. But at least it will load now. PiRSquared17 (talk) 15:30, 18 June 2014 (UTC)
Another issue is that there's a bogus "undefined" edit summary when editing the SBL log. The customization of the script via our monobooks looks also broken. Thanks. — M 10:57, 06 December 2014 (UTC)

Discussion[edit]

Symbol comment vote.svg This section is for discussion of Spam blacklist issues among other users.