Talk:Spam blacklist

From Meta, a Wikimedia project coordination wiki
(Redirected from Talk:Spam Blacklist)
Jump to: navigation, search
Requests and proposals Spam blacklist Archives (current)→
The associated page is used by the MediaWiki Spam Blacklist extension, and lists regular expressions which cannot be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any meta administrator can edit the spam blacklist; either manually or with SBHandler. For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.
Proposed additions
Please provide evidence of spamming on several wikis and prior blacklisting on at least one. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Please check back after submitting your report, there could be questions regarding your request.
Proposed removals
Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.
Other discussion
Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
#wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.
Whitelists There is no global whitelist, so if you are seeking a whitelisting of a url at a wiki then please address such matters via use of the respective Mediawiki talk:Spam-whitelist page at that wiki, and you should consider the use of the template {{edit protected}} or its local equivalent to get attention to your edit.

Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.


Completed requests are marked as {{added}}/{{removed}} or {{declined}}, and are generally archived quickly. Additions and removals are logged · current log 2016/12.

Translate this page
Projects
Information
List of all projects
Overviews
Reports
Wikimedia Embassy
Project portals
Country portals
Tools
Spam blacklist
Title blacklist
Vandalism reports
Closure of wikis
Interwiki map
Requests
Permissions
Bot flags
Logos
New languages
New projects
Username changes
Usurpation request
Translations
Speedy deletions
[edit]

snippet for logging
{{sbl-log|16125079#{{subst:anchorencode:SectionNameHere}}}}


Proposed additions[edit]

Symbol comment vote.svg This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived.

netflix spammer[edit]



  • facebook.com/Netflixsteraming100

It may even be worth to consider 'steraming' typo varieties. Ping MER-C. --Dirk Beetstra T C (en: U, T) 03:43, 13 March 2016 (UTC)

I don't particularly care about spam on Facebook, even if it appears on Wikimedia sites. Please report it if you have an account (I don't) as links to pirated TV shows are likely against their TOS. MER-C (talk) 08:35, 14 March 2016 (UTC) u
@MER-C: What there is on Facebook I don't care either (I do have an account) .. but if links to Facebook get spammed here like with this link, maybe the whole could go on the blacklist. I pinged you at first because of the 'typosquatting': steraming .. worth just blocking that whole word and taking out more than just this? --Dirk Beetstra T C (en: U, T) 08:39, 14 March 2016 (UTC)











Some IPs that XLinkBot caught. --Dirk Beetstra T C (en: U, T) 08:43, 14 March 2016 (UTC)

Hmm .. also permalinks into facebook .. https://en.wikipedia.org/w/index.php?title=Ex_on_the_Beach_(series_4)&diff=prev&oldid=709006958 .. difficult to weed out and block all of those. --Dirk Beetstra T C (en: U, T) 08:47, 14 March 2016 (UTC)

images.google.(tld)/imgres?[edit]



Im suggesting to add the preview pages from Google Images Search, as beeing simmiliar to their normal web searches redirector. Surprisingliy it seems like this has not been discussed yet, though there are at least 800 such links on dewiki, 600 on enwiki and more than 1000 on commons. --Nenntmichruhigip (talk) 12:53, 15 September 2016 (UTC)

@Nenntmichruhigip: if there are that many links on those wikis, and the sites are neither blacklisted, nor the links removed, then it is not the place of meta to impose its own opinion. If you believe that we should continue the discussion the please raise the issue at enWP, deWP and Commons and point the users here to discuss.  — billinghurst sDrewth 16:26, 16 September 2016 (UTC)
I've been directed here from dewiki, and don't know where a suitable place on enwiki and commons would be. --Nenntmichruhigip (talk) 13:36, 17 September 2016 (UTC)
The occurences on dewiki are cleaned up by now. On commons there already is an abuse filter catching new additions since two months ago, but afaict no ongoing cleanup of existing ones, despite quite some copyvios. --Nenntmichruhigip (talk) 07:33, 27 September 2016 (UTC)
Until these other wikis address the matter here, it may be best to pursue a local blacklist addition at deWP.  — billinghurst sDrewth 10:19, 27 September 2016 (UTC)
Hi!
Those links can be used as sbl-circumvention.
As we normally blacklist all possible sbl-circumventions globally here at meta, in my opinion the first place to discuss blacklisting of that domain is here. -- seth (talk) 22:04, 27 September 2016 (UTC)
Agree, though we are not meant to be setting the overarching link policy without direct consultation. To further assist, I have put a bot clean up request to Commons. I have also copied over the filter from Commons to enWP to look at the new additions to monitor, and maybe start a conversation there.  — billinghurst sDrewth 23:11, 10 October 2016 (UTC)

google.(tld)/amp/s/[edit]



Yet another google service, which can propably be used for block circumvention. (If anyone wonders what google uses it for: Tricking users into staying on google pages only, under the claim of making the AMP-enabled page load faster and getting it a higher page rank (AMP means "Accelerated Mobile Pages")) --Nenntmichruhigip (talk) 05:54, 20 October 2016 (UTC)

Spambot girlchan.net/[edit]



Spambot. --Dirk Beetstra T C (en: U, T) 17:35, 22 October 2016 (UTC)

Unsure if it is a spambot, need to have a second look later. --Dirk Beetstra T C (en: U, T) 18:29, 22 October 2016 (UTC)
GUC only shows the two edits at that IP address, and both at enWP and at seven minutes apart. So doesn't particularly look spambot, I think that it can be managed at enWP at this point in time.  — billinghurst sDrewth 06:35, 24 October 2016 (UTC)

g.co[edit]

See w:en:MediaWiki_talk:Spam-blacklist#google.co.in_shortener, reported by User:Ravensfire



This is ugly, this is one of google's link shorteners. Used across our projects a lot. However, it is prone to abuse as usual for shorteners. It is currently used in w:en:Akasa_Singh, where it redirects to a plain search result. If that is possible.... --Dirk Beetstra T C (en: U, T) 13:52, 29 November 2016 (UTC)

Standard linksearch does not give a lot .. but is informative. --Dirk Beetstra T C (en: U, T) 13:54, 29 November 2016 (UTC)
It appears that the url will indicate which google page/service the link is for. g.co/maps/ for maps, doodle for doodle, etc. g.co/kgs/ appears to be for searches, so if there is a need to keep g.co available for some services, blocking just the search may be useful. Ravensfire (talk) 15:37, 29 November 2016 (UTC)
For such a change of a commonly used url, I would have the expectation that (some|the) wikis would block it first, and we would follow. I do not see it blocked.  — billinghurst sDrewth 10:58, 3 December 2016 (UTC)
It is a url shortener .. they get blanket blacklisted on meta, not through local communities. We declined this earlier because people were arguing that it could only be used for google maps. It turns out that that is not the case, it has been (ab)used to link to search engine results (which is discouraged on some wikis, to say the least). --Dirk Beetstra T C (en: U, T) 03:32, 4 December 2016 (UTC)

Proposed additions (Bot reported)[edit]

Symbol comment vote.svg This section is for domains which have been added to multiple wikis as observed by a bot.

These are automated reports, please check the records and the link thoroughly, it may report good links! For some more info, see Spam blacklist/Help#COIBot_reports. Reports will automatically be archived by the bot when they get stale (less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot).

Sysops
  • If the report contains links to less than 5 wikis, then only add it when it is really spam
  • Otherwise just revert the link-additions, and close the report; closed reports will be reopened when spamming continues
  • To close a report, change the LinkStatus template to closed ({{LinkStatus|closed}})
  • Please place any notes in the discussion section below the HTML comment

COIBot[edit]

The LinkWatchers report domains meeting the following criteria:

  • When a user mainly adds this link, and the link has not been used too much, and this user adds the link to more than 2 wikis
  • When a user mainly adds links on one server, and links on the server have not been used too much, and this user adds the links to more than 2 wikis
  • If ALL links are added by IPs, and the link is added to more than 1 wiki
  • If a small range of IPs have a preference for this link (but it may also have been added by other users), and the link is added to more than 1 wiki.
COIBot's currently open XWiki reports
List Last update By Site IP R Last user Last link addition User Link User - Link User - Link - Wikis Link - Wikis
padangbay.com 2016-11-30 12:36:42 COIBot 104.27.167.136 36.83.189.94
36.85.107.6
2016-11-30 12:17:31 7 2

Proposed removals[edit]

Symbol comment vote.svg This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.

Remember to provide the specific domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as {{removed}} or {{declined}} and archived.

See also /recurring requests for repeatedly proposed (and refused) removals.

Notes:

  • The addition or removal of a domain from the blacklist is not a vote; please do not bold the first words in statements.
  • This page is for the removal of domains from the global blacklist, not for removal of domains from the blacklists of individual wikis. For those requests please take your discussion to the pertinent wiki, where such requests would be made at Mediawiki talk:Spam-blacklist at that wiki. Search spamlists — remember to enter any relevant language code

tampabaylightning.ru[edit]



This site was included into blacklist by mistake in 2009 and was there since. Now I remembered this fact, so I'm asking to remove it from the blacklist. Here is the story origins https://meta.wikimedia.org/wiki/User:COIBot/XWiki/newyorkislanders.ru Someone (user with IP 91.76.29.237 ) on March 1, 2009 spammed trough multilanguage Wiki pages with the links to the websites of NHL teams in .ru zone. For Tampa Bay Lightning it had a domain tampabay.ru. All these sites were working on the same engine, where registered at the same day, had the same owner in Whois etc. But tampabaylightning.ru was also included into blacklist, but it was created 8 yeras before this, was working (and working untill now), had other engine, other owner etc. Link to this website was already on Russian Tampa Bay Lightning Wiki page on the day, when link to tampabay.ru was added (this can be checked by comparing versions; link to tampabaylightning.ru was added in 2007). I tried to prove that it was a mistake back than in 2009 but the request was declined, because no one never really checked my arguments about different Whois data, the fact that al that links were added in 2009, while tampabaylightning.ru - in 2007, when all these sites never existed, etc. I had to forgot about it. But today I've remembered and here's my new request to exclude tampabaylightng.ru from the blacklist with additional argument (to all these I gave in 2009) - as of now, none of these websites, that was included into blacklist in this case doesn't exist anymore, except of tampabaylightning.ru, that never stopped it work since creation in 2002. I insist that it was included into blacklist by mistake and ask you to remove it. --Chelya (talk) 12:04, 27 September 2016 (UTC) PS. Here is the original discussion with my arguments that the site was included into blacklist by mistake https://meta.wikimedia.org/wiki/Talk:Spam_blacklist/Archives/2009-09#tampabaylightning.ru

@Chelya: Removed Removed from Spam blacklist.  — billinghurst sDrewth 10:42, 3 December 2016 (UTC)

lookchem.com[edit]



Hi,

on Talk:Spam blacklist/Archives/2010-09#Lookchem.com, this site has been spamblacklisted because "This commercial chemical search site seems to be spammed quite a lot across projects" without example or ref in 2010! now is 2016 and the "seems to be" seems to me a little light to block the site, which is very useful for find CAS numbers or build isomers'pages and I never found spamming on those pages. So I urge to get out this site to spam blacklisted website. Regards --Titou (talk) 08:49, 1 October 2016 (UTC)

@The Titou: Miaoxiao258, a globally locked account who was spamming this (and there are more accounts and IPs). It is a bit linkfarming, we have many authoritative sites (dozens) for chemicals, we do not need inadvertent additions of purely commercial sites on top of those which are often already in the article. --Dirk Beetstra T C (en: U, T) 07:40, 2 October 2016 (UTC)
@Beetstra:: We take about something which appends in 2010 and Miaoxiao258 is blocked - I think six years are enought for forget this historical spamming and now out spam blacklisted this website without tell it overall. I must say this site is useful, it gives with a chemical formula, a long list of isomers of this formula, with name, CAS number and picture. Before I used the more useful and academic http://cactvs.cit.nih.gov/cgi-bin/lookup/search but it appears it is now offline (?) ... --Titou (talk) 10:01, 2 October 2016 (UTC)
@The Titou: Special:Search with C12H22O11. 6 Years is nothing, and there are many alternatives (as e.g. our internal search). --Dirk Beetstra T C (en: U, T) 10:36, 2 October 2016 (UTC)
@Beetstra:are you joking? wp internal research only give little result and only created articles. See for exemple fr:C6H12Cl2 and try to reproduce this list with internal search! see too the result on lookchem.com/Molecular-Formula/C6H12Cl2.htm still more complet. Obviously, we're not talking about the same thing (we aren't on the same wavelength) ^^. If you don't want to finish the blocking of this site, it's really a shame - But anyway, I'll continu to use it - bye --Titou (talk) 13:24, 2 October 2016 (UTC)
@The Titou: No, I am not joking - if you want results on Wikipedia, then Wikipedia search is where you want to be, if you want more results on other sites, you go to the other sites - even Google will give you more results than Wikipedia. But we are not linking to this site for the molecular formula search (we should not link to search aggregates whichare never the same), we would in mainspace only link to it for linking to specific CAS numbers. For that there are better results. --Dirk Beetstra T C (en: U, T) 05:20, 3 October 2016 (UTC)
And note that I did not decline it yet, I am only (still) not convinced that Wikipedia should link to this. --Dirk Beetstra T C (en: U, T) 05:21, 3 October 2016 (UTC)
I don't see encyclopaedic value in linking to a search result, and I believe that such a link will fail the guidance on both external links and citing sources. If we can demonstrate that there is worthwhile citable information then we could look at that, though I would think that we would initially look to have a language WP whitelist the domain name and see how that progresses. FWIW I am always willing to give a domain name another chance, though time alone on our blacklist is not a factor nor an indicator of current good or changed behaviour.  — billinghurst sDrewth 05:35, 5 October 2016 (UTC)
@The Titou: Declined Declined stale request  — billinghurst sDrewth 10:39, 3 December 2016 (UTC)

shrinktheweb.com[edit]



The domain was included in blacklist in 2009 https://meta.wikimedia.org/wiki/User:COIBot/XWiki/learn.shrinktheweb.com just because of adding six links in thematic articles without a Wikipedia account, probably by mistake or because of not good enough quality of content of the page (but it was not a spam - see how the linked page looked liked in that time - http://web.archive.org/web/20090228194132/http://learn.shrinktheweb(dot)com/ ).

There are many admin replies here with the questions about how the domain's unblocking would be helpful for Wikipedia, so here are my answers.

The first reason - ShrinkTheWeb is the service provider for a MediaWiki extension - https://www.mediawiki.org/wiki/Extension:ShrinkTheWeb (and these services are absolutely free for most of the users - till 5000 screenshot captures per month). So, it is not normal when service provider domain for such an extension is blacklisted just because of several links posted 8 years ago.

The second reason - ShrinkTheWeb has now very strong positions in API website screenshot providers niche. There are now lots independent links about the company. I prepared the article draft https://en.wikipedia.org/wiki/User:Oleg_Sergeykin/ShrinkTheWeb cause some similar companies already have such articles (see the examples in my profile https://en.wikipedia.org/wiki/User:Oleg_Sergeykin ).

So, please, remove the domain from the blacklist.—The preceding unsigned comment was added by Oleg Sergeykin (talk) 07:05, 11 October 2016

@Oleg Sergeykin: I see that you declare a conflict of interest with ShrinkTheWeb on your en userpage. Albeit that it is used in an extension, that does not necessarily mean that that extension is going to be used on these wikis (en.wikipedia may not install it, e.g.); question: if this is going to be enabled, will it then link to the screenshot? Secondly, there is not yet an article on this subject in Wikipedia (it is in draft stage), and for this specific article the whitelisting of one page would be enough. Thirdly, I don't think that this is going to be widely used on Wikipedia.
Linking to screenshots is a bit tricky - one could link to material that should not be linked to, or link to (a copy of) material that is blacklisted, in violation of the copyright of the original, etc. etc.
I'd like to hear more thoughts on this. --Dirk Beetstra T C (en: U, T) 06:10, 11 October 2016 (UTC)
@Beetstra:
Thanks for your answer. Yes, I have declared COI in my profile. I am responsible for some web content for the company (other two sites of company are snapito.com and web-capture.net ).
Please do also consider that there was really no reason to add the domain to the blacklist. It was probably blocked just by mistake, not because of some precautions about screenshots misusage.
Regarding your notes about screenshot misusage:
https://en.wikipedia.org/wiki/Wikipedia:Image_use_policy not contains limitation for such screenshot usage.
STW uses the separate domain to upload autogenerated screenshots - images.shrinktheweb.com, so you do not need to block entire second-level domain shrinktheweb.com if you just want to block your users from screenshot usage in future (if you will use more restrictive policy regarding screenshots). And it is really hard to abuse the service by the way you described, cause free accounts could produce screenshot only up to 320x240px size and only for front pages of web sites. Full-size API screenshots and screenshots of inside pages are available only for paid accounts.
I guess, screenshot usage really not covered enough in Wikipedia policies. Here is our short article about screenshot copyright issue, with references to legal cases regarding the issue. Technically, STW prevents these problems by forbidding POST variables in API requests: https://www.reddit.com/r/ShrinkTheWeb/comments/4dc9dc/is_it_legal_to_display_screenshots_of_web_pages/
And, really, I can not submit the article, because I can not put a link to blacklisted domain in the article.—The preceding unsigned comment was added by Oleg Sergeykin (talk)
What I am mainly worried about, is whether people would screen-capture a website that is blacklisted on Wikipedia, or which is a copyvio in itself or a copyvio by making a screen-capture, and then use that image. And whether we need links to this site in the first place.
The links were blacklisted because IPs were (repeatedly) adding these links in places where they should not be - on en.wikipedia they violate the inclusion standard, there is no reason to link to a screen-capture website on a page that explains what a screenshot is. And that was done cross-wiki. This had nothing to do with what was actually linked to, whether that was spam or not, it had to do with that editors were inadvertently adding links to places where they were not of use.
You don't need to link to the website to submit a draft, it can be added later when the draft is accepted (even if that needs to wait for whitelisting, which, in case it is one link for one page, is preferred over delisting). --Dirk Beetstra T C (en: U, T) 08:51, 11 October 2016 (UTC)

I think that whitelisting the main domain of the site is reasonable, because it can be used by Oleg to prepare his draft, and maybe by other editors to link to the service in present and future relevant articles. But all subdomains should remain blocked, as I can think of no legitimate use for them at Wikipedia. --Felipe (talk) 12:16, 12 October 2016 (UTC)

Hi, Dirk @Beetstra:! It has already been a month since our discussion, but the decision on this has not been given yet, so I decided to inquire about it again.

It does not seem quite logical to me to completely block a normal website just because of a few such links from over 7 years ago - because any useful website could be blocked in Wiki on the basis of this approach simply by posting several not quite relevant links, so this approach is not quite correct. And the relevance is difficult to estimate in the case of screenshots, at least because Wiki policies regarding screenshots are quite vague, as I pointed out earlier.

I also want to draw your attention that the linkwatcher, in this case, indicated only the third-level domain (learn.shrinktheweb.com), and not the second-level domain. Second-level domain (instead of learn.shrinktheweb.com) was included in the blacklist just because of administrator's opinion, not because of linkwatcher.

Regarding your concerns about the incorrect usage of screen captures on Wiki - all these captures are stored on images.shrinktheweb.com, so there is no need to add the entire second-level domain to prevent their use. But even this will only prevent auto-generation and auto-refreshing these screenshots via STW API. Even when the entire domain is blocked on Wiki - the users still have the opportunity to use screenshots generated - by posting a captured image manually on any image hosting.

Considering all aforesaid, I beg you to unblock shrinktheweb.com domain (or at least just the second-level domain, without its subdomains - as Felipe wrote) in the global blacklist. Oleg Sergeykin (talk) 10:03, 15 November 2016 (UTC)

@Oleg Sergeykin: - to submit a draft, you don't need a working link to be in the article. And as that draft stands now, I do not see any really independent references, so a question would be notability. Please submit the draft, and see whether it sticks to become an article. Then I would likely consider whitelisting, or adaptation of the domain here (however, that the article is on one wiki while linkable on the other 700+ MediaWiki wikis and many outside .. weak argument for global delisting). @Felipe: as I said, for submitting a draft, the link is not necessary, and if other articles need to link to the article they should use the Wiki-link, not link out.
We are aware of Joe jobbing and time-arguments. I am generally not impressed by time-based arguments.
Regarding the re-posting of images, that is indeed a problem - and editors that insist, will indeed do that. It is however not really an argument to then open the door completely. This is akin leaving the front door of your house open vs. having a bad lock .. both don't stop a thief, but while the former is wide open for burglary, the second still needs someone to realize that the door is having a bad lock.
So, mainly, I would like to see whether the article sticks, evaluation of the draft is based on content, whether the external links are there is a minor part of that. I still have concerns regarding linking to content that evades blacklisting, something that happens with all sites and that is a continuous problem. I don't really see why we should open more doors (though I agree, we could block-off only the subdomains, which I think is the best solution if can be shown that the article sticks). --Dirk Beetstra T C (en: U, T) 10:33, 15 November 2016 (UTC)
reping, I made a typo: @Oleg Sergeykin:. --Dirk Beetstra T C (en: U, T) 10:34, 15 November 2016 (UTC)
> to submit a draft, you don't need a working link to be in the article.
I avoided submitting such an article draft without the link because it would definitely cause a biased attitude regarding the article draft. People will think primarily not about the article's quality and notability, but about the fact that there were some actual reasons to include the entire domain into the global blacklist. Just waited for that "more thoughts" from administrators which you asked for in your 06:10, 11 October 2016 (UTC) message. I thought then, will really provide some thoughts about this and will not ignore this discussion.
> I do not see any really independent references, so a question would be notability.
Well, there is a dozen of links from independent authority sites in the 'Reference' section of the draft.
> Please submit the draft, and see whether it sticks to become an article.
Previously, I had no such direct guidance about choosing exactly this algorithm (without the link). But after these your remarks, I have no choice - just to submit the article draft without the link. I guess, I also need to put a link to our discussion about the blacklisting in the discussion about the article, so the assessors could evaluate not only the article (notability etc) but also the adequacy of reasons for blacklisting it and provide that "more thoughts". Otherwise, the assessors would probably build some unrealistic assumptions about these reasons for blacklisting. Oleg Sergeykin (talk) 12:03, 15 November 2016 (UTC)
  • Drafts are regularly submitted without the external links - it will give some extra scrutiny, but editors there do understand that it is about the content in the first place.
  • There are not many references who write about the subject, one is a comparison of 11 different similar services, that is not specifically about - they likely took just 11 services that appear high in search results, several (most) others are site-stats (not things written about' the subject), or short posts (makeuseof). No reviews by reputable websites.
  • I don't think that reviewers will have unrealistic assumptions - I think they see the same as what I see. --Dirk Beetstra T C (en: U, T) 13:08, 15 November 2016 (UTC)
@Oleg Sergeykin: Declined Declined due to its potential for abuse. As noted by another contributor, you should seek whitelisting at the requisite wikis to enable your linking to the base domain (which we cannot do globally).  — billinghurst sDrewth 10:38, 3 December 2016 (UTC)
@Billinghurst:, well I do not agree with you regarding the actual reasons. For example, archive.is is not in your blacklist and no one uses it this way you and @Beetstra: talking about. The whitelisting evidently was declined now just because you guys decided to clean up all old discussion from this list, you just make several declinings/removals in a row. Please do not delete this discussion, the subject of the article draft was really accepted when I submitted it the first time, the draft was not marked as not-notable. The assessor just asked to do the article more neutral and more encyclopedical. I am now making significant changes in the draft, added several very respectable references, and will submit it again next week. I plan to add even more respectable references in the draft before submitting. I can not finish all the stuff right now cause it is a weekend and tomorrow is my birthday. Oleg Sergeykin (talk) 21:12, 3 December 2016 (UTC)
@Oleg Sergeykin:
  1. I made several decisions, some removed urls, some declined, on a process called review. That is the scope of the role for this page. I also separately added urls, so I am not sure of your point otherwise. It was declined at a point in time at the seeming closure of the discussion.
  2. Discussions on this page will be archived to subpages by the archiving bot. So whenever it does it, you will always find your all our discussions there. We rarely manually archive, and only when the bot cannot work it out. Nothing has been done to promote or demote the significance of any conversation.
  3. People have commented that the site has capacity for abuse, and we regularly do see many weird and wonderful, and sometimes successful means to have redirect/indirect spam. You can point to archive.is as one successful example where it hasn't, and that itself went through a discussion to be blacklisted, however, it survived for different reasons. (please see archive)
  4. It has been the practice of this forum to recommend to users that they progress through seeking consensus here to remove, or seeking whitelisting in full or in part at local wikis, either as a complete solution, or as a means to prove that a url is not being abused, and as a step process for removal from the blacklist. If you cannot successfully argue at wikis for an addition to a whitelist, that would seemingly strengthen the case to retain the domain on the blacklist.
 — billinghurst sDrewth 00:11, 4 December 2016 (UTC)

cs.com.cn[edit]



Not sure why the official website of en:China Securities Journal (source: Entry in related links in Shanghai Stock Exchange (in Chinese)) was in the blacklist, is that due to www.cs.com was banned? The alternative url zqb.cn seem point to the same web interestingly not banned. I know the website had set up the https very badly, but i have no idea why it was banned. Matthew hk (talk) 17:01, 7 November 2016 (UTC)

ping point www.zqb.cn and www.cs.com.cn to the same ip. Matthew hk (talk) 17:06, 7 November 2016 (UTC)
@Matthew hk: Removed Removed caught by a block on compuserve.com shorturl  — billinghurst sDrewth 10:53, 3 December 2016 (UTC)

iqoption.com[edit]



This is the official website of a widely popular company IQ Option that is regulated by CySEC and registered by a number of reputable regulatory bodies. I searched for its domain in the blacklists and managed to find a similar domain (biqoption.com) here: https://meta.wikimedia.org/wiki/Spam_blacklist. This domain is 100% not related to IQ Option (and I wonder if it actually exists). Could you please make necessary amendments so that I can use the official website for the article on IQ Option?—The preceding unsigned comment was added by Rrusl u (talk)

@Rrusl u: what you are referring to on the blacklist is indeed the rule that is blacklisting this site, and intended for iqoption.com. It was blacklisted by User:Vituzzu, as logged here. I'll leave it to him to explain. --Dirk Beetstra T C (en: U, T) 07:13, 15 November 2016 (UTC)
@Beetstra: Thank you! I'd like to know how could we fix it as soon as possible, otherwise I can't make this article better and protect it from being deleted (links to official website are somehow essential).
@Beetstra: Hello, Beetstra! I'm wondering if it is necessary to wait for Vituzzu's reasoning? Can we make this domain unlisted a little bit sooner and how?
@Rrusl u: A page will not get deleted just because there is a blacklisted link on it. yes, I prefer to wait for User:Vituzzu on this one, I have the feeling there was referral spam involved here. If that is the case, local whitelisting may be the way forward. --Dirk Beetstra T C (en: U, T) 03:47, 17 November 2016 (UTC)

www.wga.hu/html/t/tintoret/8/18portwo.html[edit]



This common url couldn't be saved suddenly, though not added by me--Oursana (talk) 15:29, 15 November 2016 (UTC)

@Oursana: it is the specific path up to tintoret, see Talk:Spam_blacklist/Archives/2007-03#wga.hu. I am minded to remove this (spammed, but small scale, long time ago, widely used), can you elaborate a bit what this site and this path is? (likely it did not want to save since in your edit it changed into a clickable link, it wasn't before). --Dirk Beetstra T C (en: U, T) 03:58, 16 November 2016 (UTC)
I gave you the correct difflink for my edit and I changed nothing. It was a clickable link even before. And indeed other wgas workOursana (talk) 12:09, 18 November 2016 (UTC)
@Oursana: you have changed the location of the url within the template and presumably that is being seen as a new addition. After nine years, we can remove IMO. 13:19, 21 November 2016 (UTC) — billinghurst sDrewth
@Oursana: Removed Removed from Spam blacklist.  — billinghurst sDrewth 03:12, 2 December 2016 (UTC)

RoteFahne.eu[edit]



This is the link to a legally registered newspaper in Germany, the blocking of the link is politically motivated. —The preceding unsigned comment was added by 2A02:908:180:3300:351C:D609:1255:578D (talk)

It was requested by members of the community for spamming. @Lustiger seth, Codc: Would you please comment to this request for renoval. Thanks.  — billinghurst sDrewth 13:14, 21 November 2016 (UTC)
The Die Rote Fahne is a historical newspaper and the owner of the domain rotefahne.eu try since a long time spamming his website crosswiki in all articles about the historical newspaper. The problem is that Die Rote Fahne and rotefahne.eu hasn't any relationship except the name. In the opinion of the owner of rotefahne.eu the website is a continiuum of the historical newspaper but there are no evidence for it. So it is linkspam in my eyes to get more visitors and more attention via Wikipedia. The newspaper Rote Fahne was a newspaper founded by Rosa Luxemburg and Karl Liebknecht in the year 1918. During the nazi-regime it was illegal but printed. After the end of the 3. Reich it was discontinued. Since about 1968 there are some communistic groups which comes with newspapers called Rote Fahne or similare. Since year 2000 there are a this website under the domain rotefahne.eu. Some information about this case are in the german wikipedia article [1]. I think its a legal website but it is neither a newpaper nor a successor of the historical newspaper and so there are no reason for linking in the articles about Die Rote Fahne. --Codc (talk) 21:30, 21 November 2016 (UTC)
I don't think, I can add something to this. -- seth (talk) 21:54, 21 November 2016 (UTC)

That was also the impression that I had. The existing article (cross-wiki) and the link do not have the relationship that the name implies. --Dirk Beetstra T C (en: U, T) 03:24, 22 November 2016 (UTC)

Declined Declined No clear reason to remove block, and evidence that the link has been abused previously, and no reason to expect that things have changed in that regard.  — billinghurst sDrewth 10:36, 22 November 2016 (UTC)
These allegations are false and part of the politically motivated disinformation. "Die Rote Fahne", founded in 1918 by Karl Liebknecht and Rosa Luxemburg, has been reissued 1992 by the Central Committee of the KPD/Initiative, including its Party Constitutions, DKP, KPD/East, KPF/PDS and USPD, today Spartakus.—The preceding unsigned comment was added by 2a02:908:180:3300:5162:5483:d9f9:8594‎ (talk)
That is an interesting response. It would also seem that you have a vested interest in the site, and that should be declared in this conversation.

I see no allegations made by the respondents, and there is no evidence of political motivation. I know the work of one the contributors who expressed opinion, and I see there opinion as mostly neutral, and generally informative of the approach that Wikimedia volunteers would expect.

I do not see a case to remove the domain from the blacklist at this point of time. As such our response in such cases is please seek whitelisting at individual wikis, and then we can see how that progresses. Alternatively raise this as an issue at the wiki where the community has an opinion, and refer them here to continue the discussion, to which a consensus can be attributed.  — billinghurst sDrewth 02:45, 2 December 2016 (UTC)

readysteadygirls.eu/#/nita-rossi/4523569554[edit]



This website is a main source on Nita Rossi. I don't experience any spam whatsoever. I'm puzzled about the block of this excellent website. 83.85.143.141 22:45, 1 December 2016 (UTC)

We will need to get a summary of the edits made. It was blocked back in 2008 due to a person's contribution, are you related to the account "Grahamwelch"? I am not aware of the site, would it be considered to be a WP:reliable source as per the WP definition?  — billinghurst sDrewth 02:35, 2 December 2016 (UTC)


cuevadelviento.net[edit]



This site is apparently blacklisted, but it is the official website for a legitimate tourist attraction. It was added to the list in January 2011. –StellarD (talk)

Troubleshooting and problems[edit]

Symbol comment vote.svg This section is for comments related to problems with the blacklist (such as incorrect syntax or entries not being blocked), or problems saving a page because of a blacklisted link. This is not the section to request that an entry be unlisted (see Proposed removals above).

derefer.unbubble.eu deblock[edit]





This authority is used 24.923 times in main space in dewiki!. It is used to clean up Special:Linksearch from known dead links, by redirecting them over this authority. It is hard to find a better solution for this task. --Boshomi (talk) 16:38, 24 July 2015 (UTC) Ping:User:BillinghurstBoshomi (talk) 16:49, 24 July 2015 (UTC)

Please notice Phab:T89586, while not fixed, it is not possible to find the links with standard special:LinkSearch. in dewiki we can use giftbot/Weblinksuche instead.--Boshomi (talk) 18:04, 24 July 2015 (UTC)
afaics derefer.unbubble.eu could be used to circumvent the SBL, is that correct? -- seth (talk) 21:30, 24 July 2015 (UTC)
I don't think so, the redircted URL is unchanged, so the SBL works like the achive-URLs to the Internet Archive. --Boshomi (talk) 07:44, 25 July 2015 (UTC)
It is not a stored/archived page at archive.org, it is a redirect service as clearly stated at the URL and in that it obfuscates links. To describe it in any other way misrepresents the case, whether deWP uses it for good or not. We prevent abuseable redirects from other services due to the potential for abuse. You can consider whitelisting the URL in w:de:MediaWiki:spam-whitelist if it is a specific issue for your wiki.  — billinghurst sDrewth 10:09, 25 July 2015 (UTC)
what I want to say was that the SBL-mechanism works in the same way like web.archive.org/web. A blocked URL will be blocked with unbubble-prefix to the blocked URL.--Boshomi (talk) 12:54, 25 July 2015 (UTC)

Unblocking YouTube's redirection and nocookie domains[edit]





  • --- Note past closure - a vandal kept posting pictures and linking to videos of ceiling fans - that's why the bottom two are blocked. Kernosky (talk) 13:58, 24 April 2016 (UTC)

Apparently youtu(dot)be and youtube-nocookie(dot)com, both of which are official YouTube domains owned by Google, are on this blacklist. For over ten years, the SpamBlacklist MediaWiki extension has loaded this blacklist on third-party wikis, big and small. This is quite an issue for third-party sites such as ShoutWiki, a wiki farm, since SpamBlacklist doesn't currently have the concept of "shared" whitelists — blacklists can be shared (loaded from a remote wiki), whitelists cannot. Given that the main YouTube domain isn't blocked, and also that YouTube itself hands out youtu(dot)be links, I don't think that "but it's a redirecting service" is a valid argument against it, and therefore I'd like to propose removing these two entries from the blacklist. --Jack Phoenix (Contact) 23:17, 29 August 2015 (UTC)

There are several links on youtube blacklisted here on Meta, as well many, many on local wikis. Youtube has videos that get spammed, there are videos that should simply not be linked to. Leaving open the redirects then makes the issue that not only the youtube.com link needs to be blacklisted, but also all redirect to those links. That gives either extra work to the blacklisting editors, or leaves the easy back-door open. On wikis it leaves more material to check. That in combination with that redirect services are simply never needed, there is an alternative. Additionally, Wikipedia has their built-in redirect service which also works (I mean templates, like {{youtube}}).
That there is no meta-analogue of the whitelist is a good argument to push that request of years ago to re-vamp the spam-blacklist system through and have the developers focus on features that the community wants, and certainly not an argument for me to consider not to blacklist something. Moreover, I do not think that the argument that it hampers third-party wikis is an argument either - they choose to use this blacklist, they could alternatively set up their own 'meta blacklist' that they use, copy-pasting this blacklist and removing what they do not want/need.
The problem exists internally as well, certain of our Wikifarms do allow for certain spam, which is however inappropriate on the rest of the wikifarms, and on the majority by far (in wiki-volume) of the wikis. That also needs a rewriting of the spam-blacklist system, which is crude, too difficult. A light-weight edit-filter variety, specialised on this would be way more suitable. --Dirk Beetstra T C (en: U, T) 04:05, 30 August 2015 (UTC)
  • Oppose unblocking for the reasons given above. Stifle (talk) 08:32, 21 October 2015 (UTC)
Declined Declined  — billinghurst sDrewth 06:22, 24 January 2016 (UTC)
youtu.be can only be used for youtube.com, so it's no redirecting service, so remove it from the black list. If you need to block certain yt video (what I consider btw as a little stupid) just update that system and include youtube.com as well as youtu.be that's it.
Djamana (talk) 20:08, 2 February 2016 (UTC)
@Djamana: Why do you consider that blocking of a specific YouTube video a little stupid? --Dirk Beetstra T C (en: U, T) 04:11, 3 February 2016 (UTC)
@Djamana: I did not see the above list earlier - out of the 5 on the meta spam blacklist there are still 3 active videos. Those 5 were abused and for the one of the cases where I was involved in (still active), that was pretty persistent promotion. I doubt that these need to be removed. The two that are specifically not there anymore could indeed be removed (or maybe they need to be corrected ..), still leaving 3. Moreover, these are not the only rules blocking youtube, also the many individual wikis have specific youtube videos blacklisted (and youtube can be used to earn money (and those are known to circumvent the blacklist; even regulars do!), and there is information there that simply should NEVER be linked to ..). Again Declined Declined. --Dirk Beetstra T C (en: U, T) 06:28, 14 April 2016 (UTC)

Partial matches: <change.org> blocks <time-to-change.org.uk>[edit]





I tried to add a link to <time-to-change.org.uk>, and was told that I couldn't add the link, as <change.org> was blacklisted. Is this partial-match blacklisting (based, I guess, on an incorrect interpretation of URL specifications) a known bug? Cheers. --YodinT 15:46, 21 October 2015 (UTC)

This is more of a limitation to the regex, we tend to blacklist '\bchange\.org\b', but a '-' is also a 'word-end' (the \b). I'll see if I can adapt the rule. --Dirk Beetstra T C (en: U, T) 07:46, 22 October 2015 (UTC)
change.org is not here, it is on en.wikipedia. That needs to be requested locally and then resolved there. --Dirk Beetstra T C (en: U, T) 07:48, 22 October 2015 (UTC)
Thanks for looking into this; is it worth replacing the regexes globally to fit URL specs? I'm sure I'm not the only one who will ever be/have been affected. --YodinT 11:27, 22 October 2015 (UTC)
@Yodin: Sorry, but there are no global regexes to replace, change.org is only blacklisted on en.wikipedia. You'll have to request a change on en:MediaWiki talk:Spam-blacklist (so there is a local request to do the change, then I or another en.wikipedia admin will implement it there). --Dirk Beetstra T C (en: U, T) 11:38, 22 October 2015 (UTC)
Thanks Dirk; just read this (sorry for the repeat on regexes there!). Isn't the main blacklist here also using '\bexample\.com\b'? I can come up with the general case regex if you like! --YodinT 11:44, 22 October 2015 (UTC)
You mean for every rule to exclude the '<prefix>-'-rule (i.e. put '(?<!-)' before every rule in the list - well, some of them are meant to catch all '<blah>-something.com' sites, so that is difficult. And then there are other combinations which sometimes catch as well. It is practically impossible to rule out every false positive. --Dirk Beetstra T C (en: U, T) 12:01, 22 October 2015 (UTC)
I see... much more complicated in practice than I thought. My idea was to apply it to a wider class of false positives, including the '<prefix>-' rule and more, by replacing "\b" with a regex rule which covers all and only the unreserved URI characters (upper & lowercase letters, decimal digits, hyphen, underscore, and tilde; with "dots" used in practice as delimiters). But this wouldn't cover the '<blah>-something.com' examples you gave, and having read some of the maintenance thread below which covers false positives, I won't try to press the issue! Maybe one day? Until then, I hope this goes well! Cheers for your work! --YodinT 12:26, 22 October 2015 (UTC)
@Yodin: If the foundation finally decides that it is time to solve some old bugzilla requests (over other developments which sometimes find fierce opposition), and among those the ones regarding overhaul of the spam-blacklist system, then this would be nice 'feature requests' of that overhaul. In a way, stripping down the edit-filter to pure regex matching 'per rule', with some other options added (having a regex being applied to one page or set of pages; having the regex being excluded on one page only, having the whitelist requests being added to the blacklist rule they affect, whitelisting on one page or set of pages, etc. etc.) would be a great improvement to this system. --Dirk Beetstra T C (en: U, T) 14:25, 23 October 2015 (UTC)
Closed Closed nothing to do, a block at enWP, nothing global.  — billinghurst sDrewth 09:45, 22 November 2015 (UTC)

non-ascii are not blocked?[edit]



I saw \bказино-форум\.рф\b in the page, so it's supposed to be blocked. However, I can link it: http://казино-форум.рф It seems like all non-ascii links will be able to avoid blocking.

In Thai Wikipedia (where I am an admin), there are a lot of Thai URLs that we want to put them in the local blacklist but we couldn't because of the very same reason. --Nullzero (talk) 17:42, 18 February 2016 (UTC)

This should go to Phab: quickly - that is a real issue. --Dirk Beetstra T C (en: U, T) 05:52, 21 February 2016 (UTC)
@Beetstra: Please see Phab:T28332. It seems that you need to put \xd0\xba\xd0\xb0\xd0\xb7\xd0\xb8\xd0\xbd\xd0\xbe-\xd1\x84\xd0\xbe\xd1\x80\xd1\x83\xd0\xbc\.\xd1\x80\xd1\x84 (without \b) instead of \bказино-форум\.рф\b --Nullzero (talk) 20:00, 21 February 2016 (UTC)
*sigh* somehow the workaround doesn't work with Thai characters, so I don't know if \xd0\xba\xd0\xb0\xd0\xb7\xd0\xb8\xd0\xbd\xd0\xbe-\xd1\x84\xd0\xbe\xd1\x80\xd1\x83\xd0\xbc\.\xd1\x80\xd1\x84 will actually work or not. Please try it anyway... --Nullzero (talk) 20:24, 21 February 2016 (UTC)

Free domain names[edit]









.ml, .ga, .cf, and .gq offer free domain names [2]. I'm sick of playing whack-a-mole with the TV show spam; is there anything else we can do? MER-C (talk) 13:38, 8 April 2016 (UTC)

@MER-C: could easily be blacklisted, provided that there is not too much regular material that needs to be linked on those sites. What countries do these belong to? --Dirk Beetstra T C (en: U, T) 06:38, 20 April 2016 (UTC)
For .ml we have 1581 links on en.wikipedia. It looks the majority of that are .org.ml and .gov.ml and similar (many used as references). --Dirk Beetstra T C (en: U, T) 06:41, 20 April 2016 (UTC)
.gq looks like low hanging fruit: 21 links on en.wp, only half of which are in mainspace. MER-C (talk) 12:29, 2 May 2016 (UTC)
Less than half I would say, however some of those are 'official' (I see the registrar itself, and a university). Moreover, this blocks more than only en.wikipedia (though a quick check on other wikis does not enlarge the set of genuine links too much). If we write the rules so that the (large majority of the) currently used 'good' subdomains (on, say, the 5 major wikis) are exluded, I'll pull the trigger. --Dirk Beetstra T C (en: U, T) 13:11, 2 May 2016 (UTC)
@MER-C: are we getting much spam outside of enWP? If the spam is centred on enWP, can we try the local blacklist there initially?  — billinghurst sDrewth 05:11, 23 May 2016 (UTC)
The situation is mostly under control on enWP -- we now have a private abuse filter in front of the spam blacklist which works fairly well. They've now turned to spamming via facebook, which is something I struggle to care about. Blocking these domains isn't necessary at this moment, but one sees parallels with the .tk and .co.nr situation. MER-C (talk) 07:10, 27 May 2016 (UTC)

google and springer together[edit]



I do not understand regular expressions at all. On the Finnish Wikipedia, the following link was blacklisted:

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwj2y_u_p4zMAhWHKCwKHXfCDAgQFggkMAE
&url=http://www.springer.com/cda/content/document/cda_downloaddocument/9781441983893-c1.pdf
?SGWID%3D0-0-45-1149139-p174086675&usg=AFQjCNHKA4W5amgbAZGZXCgD5ZQy5tplQw&cad=rja 

Any ideas why? The IP in question has reported the problem on our local admin noticeboard, but I cannot help them. --Pxos (talk) 20:17, 13 April 2016 (UTC)

Pxos (talk · contribs), they should link directly to the Springer website. Google redirect links include link tracking. John Vandenberg (talk) 10:11, 19 April 2016 (UTC)
@Pxos: - use http://www.springer.com/cda/content/document/cda_downloaddocument/9781441983893-c1.pdf - this link is copied from the search-result page of google, it is not the actual link to the document. --Dirk Beetstra T C (en: U, T) 06:37, 20 April 2016 (UTC)
Closed Closed direct links usable, redirecting links blocked.  — billinghurst sDrewth 15:08, 23 May 2016 (UTC)
@Pxos: at dewiki in the blacklisting message de:MediaWiki:Spamprotectiontext we included a link to https://tools.wmflabs.org/url-converter which simply converts such google-url to the original urls. Maybe this could be an option for you, too? -- seth (talk) 19:23, 14 September 2016 (UTC)

Escaping dot in myrtlebeach.com regex[edit]



  • [a-z]myrtlebeach.com\b

Should the . at the beginning of .com be escaped? – JonathanCross (talk) 15:01, 25 April 2016 (UTC)

@Billinghurst: looks like you added the regex in this revision. It was based on User:COIBot/XWiki/websitedesigninmyrtlebeach.com which suggests escaping the dot. – JonathanCross (talk) 14:49, 14 May 2016 (UTC)

The regex has been removed. Not sure there is any ongoing issue unless we get spam again.  — billinghurst sDrewth 03:45, 15 May 2016 (UTC)
Ah, great, thanks! – JonathanCross (talk) 15:31, 15 May 2016 (UTC)

Are these global, or local[edit]

Regarding my log of blocks, I have over the last days blocked almost 50 IPs whose only edits are hitting the blacklist (there is a related filter on en, that gets some hits). It makes the logs unreadable. My questions: a) is this a global problem, and b) if so, can we have a bot that globally blocks these IPs on sight (with withdrawal of talkpage access) so we de-clutter the logs. I block these for 1 month at first, as soon as they attempt to use one of the typical domains/links or when they add links to typical pages they tend to try. Is it feasible to have bot that gets access to these records and locks them globally? --Dirk Beetstra T C (en: U, T) 06:55, 19 October 2016 (UTC)

Note: seeing this (meta admin eyes only), it is global, there are many IPs with the same MO. So the request simplifies: can we have a bot that globally checks for this and lock the IPs on sight so we declutter the logs. For en.wikipedia, we have en:Template:spamblacklistblock as block reason/talkpage template for them. --Dirk Beetstra T C (en: U, T) 06:58, 19 October 2016 (UTC)

Discussion[edit]

Symbol comment vote.svg This section is for discussion of Spam blacklist issues among other users.

Expert maintenance[edit]

One (soon) archived and rejected removal suggestion was about jxlalk.com matched by a filter intended to block xlalk.com. One user suggested that this side-effect might be as it should be, another user suggested that regular expressions are unable to distinguish these cases, and nobody has a clue when and why xlalk.com was blocked. I suggest to find an expert maintainer for this list, and to remove all blocks older than 2010. The bots identifying abuse will restore still needed ancient blocks soon enough, hopefully without any oogle matching google cases. –Be..anyone (talk) 00:50, 20 January 2015 (UTC)

No, removing some of the old rules, before 2010 or even before 2007, will result in further abuse, some of the rules are intentionally wide as to stop a wide range of spamming behaviour, and as I have argued as well, I have 2 cases on my en.wikipedia list where companies have been spamming for over 7 years, have some of their domains blacklisted, and are still actively spamming related domains. Every single removal should be considered on a case-by-case basis. --Dirk Beetstra T C (en: U, T) 03:42, 20 January 2015 (UTC)
Just to give an example to this - redirect sites have been, and are, actively abused to circumvent the blacklist. Some of those were added before the arbitrary date of 2010. We are not going to remove those under the blanket of 'having been added before 2010', they will stay blacklisted. Some other domains are of similar gravity that they should never be removed. How are you, reasonably, going to filter out the rules that never should be removed. --Dirk Beetstra T C (en: U, T) 03:52, 20 January 2015 (UTC)
By the way, you say ".. intended to block xlalk.com .." .. how do you know? --Dirk Beetstra T C (en: U, T) 03:46, 20 January 2015 (UTC)
I know that nobody would block icrosoft.com if what they mean is microsoft.com, or vice versa. It's no shame to have no clue about regular expressions, a deficit we apparently share.:tongue:Be..anyone (talk) 06:14, 20 January 2015 (UTC)
I am not sure what you are referring to - I am not native in regex, but proficient enough. The rule was added to block, at least, xlale.com and xlalu.com (if it were ONLY these two, \bxlal(u|e)\.com\b or \bxlal[ue]\.com\b would have been sufficient, but it is impossible to find this far back what all was spammed, possibly xlali.com, xlalabc.com and abcxlale.com were abused by these proxy-spammers. --Dirk Beetstra T C (en: U, T) 08:50, 20 January 2015 (UTC)
xlalk.com may have been one of the cases, but one rule that was blacklisted before this blanket was imposed was 'xlale.com' (xlale.com rule was removed in a cleanout-session, after the blanket was added). --Dirk Beetstra T C (en: U, T) 04:45, 20 January 2015 (UTC)
The dots in administrative domains and DNS mean something, notably foo.bar.example is typically related to an administrative bar.example domain (ignoring well-known exceptions like co.uk etc., Mozilla+SURBL have lists for this), while foobar.example has nothing to do with bar.example. –Be..anyone (talk) 06:23, 20 January 2015 (UTC)
I know, but I am not sure how this relates to this suggested cleanup. --Dirk Beetstra T C (en: U, T) 08:50, 20 January 2015 (UTC)
If your suggested clean-ups at some point don't match jxlalk.com the request by a Chinese user would be satisfied—as noted all I found out is a VirusTotal "clean", it could be still a spam site if it ever was a spam site.
The regexp could begin with "optionally any string ending with a dot" or similar before xlalk. There are "host name" RFCs (LDH: letter digit hyphen) up to IDNAbis (i18n domains), they might contain recipes. –Be..anyone (talk) 16:56, 20 January 2015 (UTC)
What suggested cleanups? I am not suggesting any cleanup or blanket removal of old rules. --Dirk Beetstra T C (en: U, T) 03:50, 21 January 2015 (UTC)
  • I have supported delisting above, having researched the history, posted at Talk:Spam_blacklist/About#Old_blacklisting_with_scanty_history. If it desired to keep xlale.com and xlalu.com on the blacklist (though it's useless at this point), the shotgun regex could be replaced with two listings, easy peasy. --Abd (talk) 01:42, 21 January 2015 (UTC)
    As I said earlier, are you sure that it is only xlale and xlalu, those were the two I found quickly, there may have been more, I do AGF that the admin who added the rule had reason to blanket it like this. --Dirk Beetstra T C (en: U, T) 03:50, 21 January 2015 (UTC)
Of course I'm not sure. There is no issue of bad faith. He had reason to use regex, for two sites, and possibly suspected additional minor changes would be made. But he only cited two sites. One of the pages was deleted, and has IP evidence on it, apparently, which might lead to other evidence from other pages, including cross-wiki. But the blacklistings themselves were clearly based on enwiki spam and nothing else was mentioned. This blacklist was the enwiki blacklist at that time. After enwiki got its own blacklist, the admin who blacklisted here attempted to remove all his listings. This is really old and likely obsolete stuff. --Abd (talk) 20:07, 21 January 2015 (UTC)
3 at least. And we do not have to present a full case for blacklisting (we often don't, per en:WP:BEANS and sometimes privacy concerns), we have to show sufficient abuse that needs to be stopped. And if that deleted page was mentioned, then certainly there was reason to believe that there were cross-wiki concerns.
Obsolete, how do you know? Did you go through the cross-wiki logs of what was attempted to be spammed? Do you know how often some of the people active here are still blacklisting spambots using open proxies? Please stop with these sweeping statements until you have fully searched for all evidence. 'After enwiki got its own blacklist, the admin who blacklisted here attempted to remove all his listings.' - no, that was not what happened. --Dirk Beetstra T C (en: U, T) 03:16, 22 January 2015 (UTC)
Hi!
I searched all the logs (Special:Log/spamblacklist) of several wikis using the regexp entry /xlal[0-9a-z-]*\.com/.
There were almost no hits:
w:ca: 0
w:ceb: 0
w:de: 0
w:en: 1: 20131030185954, xlalliance.com
w:es: 1: 20140917232510, xlalibre.com
w:fr: 0
w:it: 0
w:ja: 0
w:nl: 0
w:no: 0
w:pl: 0
w:pt: 0
w:ru: 0
w:sv: 0
w:uk: 0
w:vi: 0
w:war: 0
w:zh: 1: 20150107083744, www.jxlalk.com
So there was just one single hit at w:en (not even in the main namespace, but in the user namespace), one in w:es, and one in w:zh (probably a false positive). So I agree with user:Abd that removing of this entry from the sbl would be the best solution. -- seth (talk) 18:47, 21 February 2015 (UTC)
Finally an argument based on evidence (these logs should be public, not admin-only - can we have something like this in a search-engine, this may come in handy in some cases!). Consider removed. --Dirk Beetstra T C (en: U, T) 06:59, 22 February 2015 (UTC)
By the way, Seth, this is actually no hits - all three you show here are collateral. Thanks for this evidence, this information would be useful on more occasions to make an informed decision (also, vide infra). --Dirk Beetstra T C (en: U, T) 07:25, 22 February 2015 (UTC)
I am not sure that we want the Special page to be public, though I can see some value in being able to have something at ToolLabs to be available to run queries, or something available to be run through quarry.  — billinghurst sDrewth 10:57, 22 February 2015 (UTC)
Why not public? There is no reason to hide this, this is not BLP or COPYVIO sensitive information in 99.99% of the hits. The chance that this is non-public information is just as big as for certain blocks to be BLP violations (and those are visible) ... --Dirk Beetstra T C (en: U, T) 04:40, 23 February 2015 (UTC)

Now restarting the original debate[edit]

As the blacklist is long, and likely contains rules that are too wide a net and which are so old that they are utterly obsolete (or even, may be giving collateral damage on a regular basis), can we see whether we can set up some criteria (that can be 'bot tested'):

  1. Rule added > 5 years ago.
  2. All hits (determined on a significant number of wikis), over the last 2 years (for now: since the beginning of the log = ~1.5 years) are collateral damage - NO real hits.
  3. Site is not a redirect site (should not be removed, even if not abused), is not a known phishing/malware site (to protect others), or a true copyright violating site. (this is hard to bot-test, we may need s.o. to look over the list, take out the obvious ones).

We can make some mistakes on old rules if they are not abused (remove some that actually fail #3) - if they become a nuisance/problem again, we will see them again, and they can be speedily re-added .. thoughts? --Dirk Beetstra T C (en: U, T) 07:25, 22 February 2015 (UTC)

@@Hoo man: you have worked on clean up before, some of your thoughts would be welcomed.  — billinghurst sDrewth 10:53, 22 February 2015 (UTC)
Doing this kind of clean up is rather hard to automatize. What might be working better for starters could be removing rules that didn't match anything since we started logging hits. That would presumably cut down the whole blacklist considerably. After that we could re-evaluate the rest of the blacklist, maybe following the steps outlined above. - Hoo man (talk) 13:33, 22 February 2015 (UTC)
Not hitting anything is dangerous .. there are likely some somewhat obscure redirect sites on it which may not have been attempted to be abused (though, also those could be re-added). But we could do test-runs easily - just save a cleaned up copy of the blacklist elsewhere, and diff them against the current list, and see what would get removed.
Man, I want this showing up in the RC-feeds, then LiWa3 could store them in the database (and follow redirects to show what people wanted to link to ..). --Dirk Beetstra T C (en: U, T) 03:30, 23 February 2015 (UTC)
Hi!
I created a table of hits of blocked link additions. Maybe it's of use for the discussion: User:lustiger_seth/sbl_log_stats (1,8 MB wiki table).
I'd appreciate, if we deleted old entries. -- seth (talk) 22:12, 26 February 2015 (UTC)
Hi, thank you for this, it gives a reasonable idea. Do you know if the rule-hits were all 'correct' (for those that do show that they were hit) or mainly/all false-positives (if they are false-positive hitting, we could based on this also decide to tighten the rule to avoid the false-positives). Rules with all-0 (can you include a 'total' score) would certainly be candidates for removal (though still determine first whether they are 'old' and/or are nono-sites before removal). I am also concerned that this is not including other wikifarms - some sites may be problematic on other wikifarms, or hitting a large number of smaller wikis (which have less control due to low admin numbers). --Dirk Beetstra T C (en: U, T) 03:36, 8 March 2015 (UTC)
Hi!
We probably can't get information of false positives automatically. I added a 'sum' column.
Small wikis: If you give me a list of the relevant ones, I can create another list. -- seth (talk) 10:57, 8 March 2015 (UTC)
Thanks for the sum-column. Regarding the false-positives, it would be nice to be able to quickly see what actually got blocked by a certain rule, I agree that that then needs a manual inspection, but the actual number of rules with zero hits on the intended stuff to be blocked is likely way bigger than what we see.
How would you define the relevant small wikis - that is depending on the link that was spammed? Probably the best is to parse all ~750 wiki's, make a list of rules with 0 hits, and a separate list of rules with <10 hits (and including there the links that were blocked), and exclude everything above that. Then these resulting rules should be filtered by those which were added >5 years ago. That narrows down the list for now, and after a check for obvious no-no links, those could almost be blanket-removed (just excluding the ones with real hits, the obvious redirect sites and others - which needs a manual check). --Dirk Beetstra T C (en: U, T) 06:59, 9 March 2015 (UTC)
Hi!
At User:Lustiger_seth/sbl_log_stats/all_wikis_no_hits there's a list containing ~10k entries that never triggered the sbl during 2013-sep and 2015-feb anywhere (if my algorithm is correct).
If you want to get all entries older than 5 years, then it should be sufficent to use only the entries in that list until (and including) \bbudgetgardening\.co\.uk\b.
So we could delete ~5766 entries. What do think? Shall we give it a try? -- seth (talk) 17:06, 18 April 2015 (UTC)
The question is, how many of those are still existing redirect sites etc. Checking 5800 is quite a job. On the other hand, with LiWa3/COIBot detecting - it is quite easy to re-add them. --Dirk Beetstra T C (en: U, T) 19:28, 21 April 2015 (UTC)
According to the last few lines, I've removed 124kB of non-hitting entries now. I did not remove all of them, because some were url shorteners and I guess, that they are a special case, even if not used yet. -- seth (talk) 22:25, 16 September 2015 (UTC)

Blacklisting spam URLs used in references[edit]

Looks like a site is using the "references" section as a spam farm. If a site is added to this list, can the blacklist block the spam site? Raysonho (talk) 17:45, 5 September 2015 (UTC)

Yes they can.--AldNonymousBicara? 21:56, 5 September 2015 (UTC)
Thanks, Aldnonymous! Raysonho (talk) 00:07, 6 September 2015 (UTC)

url shorteners[edit]

Hi!
IMHO the url shorteners should be grouped in one section, because they are a special group of urls that need a special treatment. A url shortener should not be removed from sbl unless the domain is dead, even if it has not been used for spamming, right? -- seth (talk) 22:11, 28 September 2015 (UTC)

That would be beneficial to have them in a section. Problem is, most of them are added by script, and are hence just put at the bottom. --Dirk Beetstra T C (en: U, T) 04:51, 4 October 2015 (UTC)
Maybe it would seem more preferable to have "spam blacklist" be a compilation file, made of files one of which would be "spam blacklist.shorteners"  — billinghurst sDrewth 12:15, 24 December 2015 (UTC)
This seems like a nice idea. Would certainly help with cleaning up of it (which we don't do nowadays). IIRC, it is technically possible to have different spam blacklist pages so this is technically possible, just needs a agreement among us and someone to do it. --Glaisher (talk) 12:17, 24 December 2015 (UTC)

@Beetstra, Lustiger seth, Glaisher, Vituzzu, MarcoAurelio, Hoo man, Legoktm: and others. What are your thoughts on a concatenation of files as described above. If we have a level of agreement, then we can work out the means to an outcome.  — billinghurst sDrewth 12:39, 25 January 2016 (UTC)

  • I am somewhat in favour of this - split the list into a couple of sublists - one for url-shorteners, one for 'general terms' (mainly at the top of the list currently), and the regular list. It would however need an adaptation of the blacklist script (I've done something similar for en.wikipedia (a choice of blacklisting or revertlisting for each link), I could give that hack a try here, time permitting). AFAIK the extension in the software is capable of handling this. Also, it would be beneficial for the cleanout work, that the blacklist itself is 'sectioned' into years. Although being 8 years old is by no means a reason to expect that the spammers are not here anymore (I have two cases on en.wikipedia that are older than that), we do tend to be more lenient with the old stuff. (on the other hand .. why bother .. the benefits are mostly on our side so we don't accidentally remove stuff that should be solved by other means). --Dirk Beetstra T C (en: U, T) 13:05, 25 January 2016 (UTC)
Is it really possible to have different spam blacklist pages? What would happen to the sites that use this very list to block unwanted spam? —MarcoAurelio 14:23, 25 January 2016 (UTC)
It is technically possible. But this would mean that if we move all the URL shortener entries to a new page, all sites using it currently would have to update the extension or explicitly add the new blacklist to their config or these links would be allowed on their sites (and notifying all these wikis about this breaking change is next to impossible). Another issue I see is that a new blacklist file means there would be a separate network request on cache miss so their might be a little delay in page saves (but I'm not sure whether this delay would be a noticeable delay). --Glaisher (talk) 15:38, 25 January 2016 (UTC)
Hi!
Before we activate such a feature, we should update some scripts that don't know anything about sbl subpages yet.
Apart from that I don't think that a sectioning into years would be of much use. One can use the (manual) log for this. A subject-oriented sectioning could be of more use, but this would also be more difficult for us. -- seth (talk) 20:49, 27 January 2016 (UTC)

Unreadable[edit]

Why is the list not alphabetical, so I can look up whether a certain site is listed and then also look up when it was added? --Corriebertus (talk) 08:55, 21 October 2015 (UTC)

hi!
there are advantages and disadvantages of a alphabetical list. for example it would be very helpful to group all url shorteners at one place (see discussion thread above). sometimes it's better to have a chronological list. additionally to that regexp can't be really sorted domain-alphabetically.
if you want search the blacklist, you can use a tool like https://tools.wmflabs.org/searchsbl/. -- seth (talk) 17:16, 30 October 2015 (UTC)
Because no one has done it. It is not something that I will spend my time doing.  — billinghurst sDrewth 10:59, 3 December 2016 (UTC)