Talk:Spam blacklist

From Meta, a Wikimedia project coordination wiki

(Redirected from WM:SBL)
Jump to: navigation, search
Requests and proposals Spam blacklist Archives (current)→
Shortcut:
WM:SPAM
WM:SBL
The associated page is used by the MediaWiki Spam Blacklist extension, and lists strings of text that may not be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any meta administrator can edit the spam blacklist. There is also a more aggressive way to block spamming through direct use of $wgSpamRegex. Only system administrators can make changes to $wgSpamRegex, and its use is to be avoided whenever possible. For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.
Proposed additions
Please provide evidence of spamming on several wikis. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Please check back after submitting your report, there could be questions regarding your request.
Proposed removals
Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.

Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.

Completed requests are marked as {{added}}/{{removed}} or {{declined}}, and are generally archived (search) quickly. Additions and removals are logged.

Other discussion
Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
#wikimedia-external-links - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.
Projects

Information

Tools

Requests

snippet for logging
{{sbl-log|1748079#{{subst:anchorencode:SectionNameHere}}}}

Contents

[edit] Proposed additions

Symbol comment vote.svg This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived.

[edit] co.cc again



MZMcBride removed this entry. Wikipedia claims that the domain is not a real TLD and is used for URL redirectors. On that basis, I think it should be re-added with a preceeding dot: \.co\.cc\b  — Mike.lifeguard | @en.wb 19:43, 11 November 2009 (UTC)

Having had to deal with a lot of the spam links, I strongly endorse restoring this link. No offense, MZM, but this one should have been discussed first before removal. --Ckatz 10:05, 13 November 2009 (UTC)
It's clearly not just being used for URL redirection. As the Wikipedia article notes, it can be used as a real DNS. .com is capable of URL redirection and brings in a lot more spam. I don't think blacklisting an entire TLD or ccTLD (real or not) is a good idea, though I can understand it's the simplest solution. Is there a complementary global whitelist? Do we have any idea how many false positives this addition to the blacklist will cause? --MZMcBride 10:42, 13 November 2009 (UTC)
Do you have any evidence supporting your reason for removal? Discussion when working with others is critical, and your flippant response to my query is worrying.
As to the substantive issue: Yes, there will be candidates for whitelisting, that was acknowledged and addressed from the initial request for blacklisting. I haven't seen that the rate is unacceptable, which you simply take as a premise, and we have helped users to request whitelisting where necessary, and will continue to do so.  — Mike.lifeguard | @en.wb 14:56, 13 November 2009 (UTC)
Flippant? You've globally blacklisted an entire ccTLD, which has broad implications on 700+ projects, plus an unknown number of sites that also use this list. This entry in particular is creating an unknown (and possibly high) number of false positives (I'm only here because there was a local problem at en.wiki regarding what appears to be an entirely valid URL and it was baffling how the URL could be blacklisted). Here's the diff of you broadening the regex—where was the discussion for doing this? I don't see anything in the log, though admittedly the log is nearly impossible to navigate. (If there is no discussion, what was the rationale? Is there supporting data to suggest that the only possible approach here is to block the entire ccTLD, an obviously extreme tactic?) --MZMcBride 16:33, 13 November 2009 (UTC)
I think you missed this.  — Mike.lifeguard | @en.wb 16:36, 13 November 2009 (UTC)
Are discussions on this talk page archived anywhere? I checked the log (silly me, I know). Reading the old discussion, I'm still baffled about the rationale here. It can be used for URL redirection. So can literally any other domain (top-level or otherwise). That's not an argument to ban any and all uses of it. If there's evidence that this domain is unmanageable and won't result in an excessive number of false positives, I don't have an issue with including such a broad regex. But I'd like there to be some specific data to point to, not just "can be used for URL redirection," which I consider a non-argument. --MZMcBride 16:42, 13 November 2009 (UTC)
Not "can" -- "is" (well, "was" until you removed it :D). You can see User:COIBot/XWiki/co.cc for a small taste (too many results to generate the large taste) - or the original request. Anecdotally, yes, we know it was abused cross-wiki; that's why I added it when JzG brought the request here - if not it would have been "add to XLinkBot for enwiki, and we'll attempt to monitor on other wikis with COIBot.  — Mike.lifeguard | @en.wb 16:50, 13 November 2009 (UTC)

(unindent) A question about User:COIBot/LinkReports/co.cc. How is the false positive ratio determined? It looks like the bot finds all instances of the domain (or part of a domain string) being added to a page, but are there are numbers regarding how many of these additions were legitimate? (There are legitimate uses of this ccTLD, right?) --MZMcBride 09:57, 15 November 2009 (UTC)

I'm not sure what you mean by "false positive" in this context -- the bot cannot decide whether a link addition is appropriate or not since it's a bot.  — Mike.lifeguard | @en.wb 05:21, 2 December 2009 (UTC)

[edit] Strange url shortener



I do not know what it is, but it works =): to/wiki → ru.wikipedia.org Track13 0_o 02:21, 22 November 2009 (UTC)

  • some browsers set .com as a default domain, use to./wiki Track13 0_o 14:17, 22 November 2009 (UTC)
Added Added Huib talk 14:18, 22 November 2009 (UTC)
  • This regex is incorrect. It block's any link with (any boundary symbol)to(any boundary symbol), towards-to-taiwan.ru for example. Fix, please, or remove this entry from black list Track13 0_o 14:46, 25 November 2009 (UTC)
Removed Removed. To make the correct regexp. —Dferg (disputatio) 15:01, 25 November 2009 (UTC)
Thanks Dferg for the fix, sorry for the mistake. Huib talk 15:22, 25 November 2009 (UTC)
Hi! I don't habe much time for testing it, so I won't add the entry, but anybody, who has some time, could try this one:
(?<=://)to\.?/
i.e. 'any "to./" or "to/" that is preceeded by "://"'.
(?<=://) is a zero-width positive look-behind assertion (see php or perl manual), and it's needed here, because otherwise the whole tld ".to" and all "...-to" domains would be blocked. -- seth 21:10, 25 November 2009 (UTC)
I test this in local black list, at first look it works Track13 0_o 21:51, 25 November 2009 (UTC)

I'm unclear what's going on here - this doesn't seem to be related to any browser "helpfulness":

mikelifeguard@arbour:~$ wget -S to
--2009-12-02 01:22:21--  http://to/
Resolving to... 216.74.32.103
Connecting to to|216.74.32.103|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Date: Wed, 02 Dec 2009 05:22:22 GMT
  Server: Apache/1.3.27 (Unix)  (Red-Hat/Linux) mod_perl/1.26
  Connection: close
  Content-Type: text/html; charset=ISO-8859-1
Length: unspecified [text/html]
Saving to: `index.html'

    [ <=>                                   ] 727         --.-K/s   in 0s      

2009-12-02 01:22:21 (50.6 MB/s) - `index.html' saved [727]

mikelifeguard@arbour:~$ cat index.html 
<!DOCTYPE html
	PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
	 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<head>
<title>TO. -- Get Shorty URL</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body>
<form method="post" action="/" enctype="multipart/form-data">
<table><tr><td>Enter a long URL:</td> <td><input type="text" name="url"  size="50" /></td></tr><tr><td>Enter an optional name:</td> <td><input type="text" name="name"  size="20" /></td></tr><tr><td>&nbsp</td> <td><input type="submit" name="'Witz that URL!" value="'Witz that URL!" /></td></tr></table></form>
</body>

 — Mike.lifeguard | @en.wb 05:25, 2 December 2009 (UTC)

Maybe it's DNS server helpfulness?
$ wget -S to
--2009-12-02 10:09:29--  http://to/
Resolving to... failed: Name or service not known.
wget: unable to resolve host address “to”
As you see it doesn't work for me. In any case, I wouldn't blacklist it for now. Wait and see if any problems arise and then start worrying about blacklisting. --Erwin 09:12, 2 December 2009 (UTC)

[edit] studyguide.com.vn






plus 3 others

Spammed on en and vi with the creation of multiple pages solely to house links and citations to studyguide.com.vn. See WikiProject Spam item. MER-C 13:58, 23 November 2009 (UTC)

Added Added. --Finn Rindahl 14:18, 23 November 2009 (UTC)

[edit] Two more URL shorteners

Moved from the #Discussion section. Format adapted. —Dferg (disputatio) 17:52, 24 November 2009 (UTC)




I doubt these two are in wide use (or whether they offer reliable service), but suggest to blacklist nonetheless. If you need help with Hebrew, let me know (preferably on HE WP). Odedee 01:10, 24 November 2009 (UTC)

Added Added Huib talk 14:26, 26 November 2009 (UTC)

[edit] gratisweb.com



It has been reported today on the Spanish Wikipedia Technical Village Pump that this site may contain viruses, trojans and other kind of malware capable to compromise the security of the computer. I checked on various sites (Google, Symantec and Malwaredomainlist) and I got the same security warnings on all of them. In a preventive step I have blacklisted the whole domain on es.wikipedia pending further investigation. I request a review on this site to consider if blacklisting is possible (I know that malware IS a reason for global blacklisting, I need just some outside views on this case). Thank you for your attention. —Dferg (disputatio) 19:14, 26 November 2009 (UTC)

Added Added Huib talk 11:04, 27 November 2009 (UTC)

[edit] xlurl.de



url shortener
see e.g. w:de:WP:SBL#xlurl.de.2C_pokerstrategy.com. Added Added -- seth 14:33, 28 November 2009 (UTC)

[edit] pege.org



I'm not sure about this one. Massive link additions at en and de talk pages and in some articles. Afaics mostly en:user:Pege.founder (de:user:Pege.founder) added the links. Not all of the links seem to be in contrary to WP:EL.
discussions: w:en:user_talk:Pege.founder#Spam, w:de:user talk:Pege.founder#Werbung.2Fexterne_Links -- seth 11:55, 29 November 2009 (UTC)

I don't have enough time to walk through all global link additions, so I'll block the domain at de-wiki now. -- seth 14:06, 29 November 2009 (UTC)

[edit] abqo.com



URL shortener. MER-C 11:51, 30 November 2009 (UTC)

Added Added Huib talk 17:20, 30 November 2009 (UTC)

[edit] Proposed additions (Bot reported)

Symbol comment vote.svg This section is for domains which have been added to multiple wikis as observed by a bot.

These are automated reports, please check the records and the link thoroughly, it may report good links! For some more info, see Spam blacklist/Help#COIBot_reports. Reports will automatically be archived by the bot when they get stale (less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot).

Sysops
  • If the report contains links to less than 5 wikis, then only add it when it is really spam
  • Otherwise just revert the link-additions, and close the report; closed reports will be reopened when spamming continues
  • To close a report, change the LinkStatus template to closed ({{LinkStatus|closed}})
  • Please place any notes in the discussion section below the HTML comment

[edit] COIBot

The LinkWatchers report domains meeting the following criteria:

  • When a user mainly adds this link, and the link has not been used too much, and this user adds the link to more than 2 wikis
  • When a user mainly adds links on one server, and links on the server have not been used too much, and this user adds the links to more than 2 wikis
  • If ALL links are added by IPs, and the link is added to more than 1 wiki
  • If a small range of IPs have a preference for this link (but it may also have been added by other users), and the link is added to more than 1 wiki.



[edit] Proposed removals

Symbol comment vote.svg This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.

Remember to provide the specific domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as {{removed}} or {{declined}} and archived.

See also /recurring requests for repeatedly proposed (and refused) removals.

The addition or removal of a domain from the blacklist is not a vote; please do not bold the first words in statements.

[edit] zoophilia.co.cc



Just wanted to add this as my personal website on my mediawiki profiles. I've no idea why it's blocked I've not long registered on here nor there. Friend of mine said it might be because the last owner might have used spam, but I haven't even got enough bandwidth to do that lol. Thanks, James The preceding unsigned comment was added by James D Smith (talk • contribs) .

Comment Comment - a user page isn't made for promotion for your own site. Huib talk 14:28, 26 November 2009 (UTC)

[edit] cevennes-tourisme.fr



We've got a request on the fr-wp's admin requests page to remove this site from the blacklist... I don't really understand why it was blocked on the first place, as it is an official site belonging to the local chamber of commerce (Per whois.net : "contact: CHAMBRE COMMERCE ET D INDUSTRIE D ALES"), so a link is pertinent on fr:Cévennes and fr:Chambre de commerce et d'industrie d'Alès Cévennes, and the articles for the towns and local structures in the area... -Ash Crow 12:59, 26 November 2009 (UTC)

Removed Removed Huib talk 14:24, 26 November 2009 (UTC)

[edit] www.newskentei.jp



This is the official website for the article ja:ニュース時事能力検定.

We got a request at ja:MediaWiki talk:Spam-blacklist#www.newskentei.jp to remove this site from the blacklist. It is an official site of "The society for Testing News Proficiency" in Tokyo - was listed here because several IP users went on spamming in 2007 when this service was about to launch. Today this sevice has grown well known, I think this site may be de-listed now. And if spamming should begin again, we can now block it on our local Spam-blacklist. --miya 01:49, 29 November 2009 (UTC)
Removed Removed -- seth 11:33, 29 November 2009 (UTC)
Thank you.--miya 15:31, 29 November 2009 (UTC)

[edit] www.thesportsinterview.com



www.thesportsinterview.com is a maintained web site that features audio recordings of interviews, but thesportsinterview.com (without www) has been linkfarmed, which keeps me from citing the former at q:Michael Jackson. Can the list be modified to distinguish between the two? 05:05, 2 December 2009 (UTC)

Those are the same domain.  — Mike.lifeguard | @en.wb 05:18, 2 December 2009 (UTC)

[edit] Troubleshooting and problems

Symbol comment vote.svg This section is for comments related to problems with the blacklist (such as incorrect syntax or entries not being blocked), or problems saving a page because of a blacklisted link. This is not the section to request that an entry be unlisted (see Proposed removals above).

None currently

[edit] Discussion

Symbol comment vote.svg This section is for discussion of Spam blacklist issues among other users.

[edit] wmf4.me / enwn.net

Ran into Mike_lifeguard during a #wikimedia-strategy, he asked me to pop an email off to info-en-l explaining what wmf4.me is and why it shouldn't be black listed. Who then told me to post it here. Hopefully the rat has found it's cheese...

So first, domain: http://wmf4.me/ ( http://enwn.net/ is an alias, the original name, in the process of changing all the titles).

Short and sweet version: It is a URL Shortening service (w:URL Shortening http://wmf4.me/42EEb ). Anyone can create a shortened link, but it is somewhat user un-friendly at the moment. There are only 3 methonds. #1 is by the automated RSS->Twitter proccess, #2 is the bookmark ( http://wmf4.me/bookmark.php ), #3 is a gadget we made for English Wikinews ( mentioned here: http://enwn.net/5e231 ). Method #4, the venerable "web form" is in development.

Most importantly, it is designed for Foundation sites only. It is setup with a white list of domains that are allowed to be shortened (IE: Wikipedia.org, Wikinews.org, Mediawiki.org, etc). Anything outside of the foundation URLs, the Shortener will throw an error on. --ShakataGaNai ^_^ 18:36, 20 November 2009 (UTC)

How will it be used? I don't understand the need for URL shortening. Why don't you just add the real URL, which will tell you something about the site you're linking to? In the case of WMF projects you can also just use an internal link. --Erwin 09:56, 22 November 2009 (UTC)
Initially this started for use on Twitter. I don't see any use for this in main space usage, as you said, use the real URL. This has its uses outside of the main space. For example we use them at n:WN:REPORTS because we write the report on wiki then email it out. --ShakataGaNai ^_^ 03:30, 23 November 2009 (UTC)
Shortened URLs are evil, but for use in confined spaces (like twitter) the evil can maybe be justified. However, the reason we blacklist URL shorteners is different: the possibility of abuse. However, this one allows links only to WMF-owned domains. Ordinarily, I would say that's acceptable, even if actually using the shortened URLs is a bad idea. Keep in mind that on-wiki is not a confined space, and shortened URLs should never be used on-wiki. However, it seems that ShakataGaNai can and did (deliberately [1]) add some which evade blacklist rules. That, I think, is unacceptable.  — Mike.lifeguard | @en.wb 16:23, 22 November 2009 (UTC)
I evaded the rules (Being the only person that can, magical DB power and all), for a valid cause. Linking to that site was blacklisted long while ago, ok, they spammed. I was linking to them NOT TO SPAM, but to help facilitate a proposal. We had no idea how long it would take this group to get around to removing fh.net from the blocklist. ::shrugs:: I don't know what you want me to say. --ShakataGaNai ^_^ 03:30, 23 November 2009 (UTC)
You could have locally whitelisted the URL. I don't think it's necessary to blacklist wmf4.me if and only if the links can only point to Foundation projects. Clearly that was not the case in the past. Will it be in the future? Meaning that any existing redirects to non-WMF sites will be blacklisted and such redirects can never be created in the future. --Erwin 09:08, 23 November 2009 (UTC)
It isn't that they can't be added, ShakataGaNai does and will have access to do it for the forseeable future. I assume he won't be circumventing more blacklist rules...  — Mike.lifeguard | @en.wb 05:19, 2 December 2009 (UTC)

[edit] Mailinglist

Hi, Maybe a silly idea but is it a idea to create a mailinglist also for "co-ordination of activities related to maintenance of the blacklist" since not everybody uses IRC, it could be that Mail is the second best option to reach lot all the users active here.

Best regards, Huib talk 10:04, 23 November 2009 (UTC)

Well, it is sort of a problem that the activities related to maintenance of the blacklist to a large extend is coordinated via IRC - some user don't want to use IRC. But I do not see the need for a mailinglist, as there's practically nothing in my experience (no privacy issues etc) that can't be discussed onwiki if necessary. This is more a question of us #wikipedia-external-links regulars to be mindful that info/discussion where it's important to reach all should be posted onwiki and not just shared between whoever of us is on the channel at any given time. Finn Rindahl 12:15, 23 November 2009 (UTC)
I think talk pages on-wiki work just fine. --MZMcBride 19:08, 23 November 2009 (UTC)
Yeah, I don't see any need for a mailing list.  — Mike.lifeguard | @en.wb 05:20, 2 December 2009 (UTC)