Talk:Spam blacklist: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Latest comment: 8 years ago by Beetstra in topic Troubleshooting and problems
Content deleted Content added
Yodin (talk | contribs)
→‎Troubleshooting and problems: new section: Partial matches: <change.org> blocks <time-to-change.org.uk>
Line 231: Line 231:


=== Partial matches: <change.org> blocks <time-to-change.org.uk> ===
=== Partial matches: <change.org> blocks <time-to-change.org.uk> ===
* {{LinkSummary|change.org}}
* {{LinkSummary|time-to-change.org.uk}}

I tried to add a link to <time-to-change.org.uk>, and was told that I couldn't add the link, as <change.org> was blacklisted. Is this partial-match blacklisting (based, I guess, on an incorrect interpretation of URL specifications) a known bug? Cheers. --<span style="text-shadow:grey 0.15em 0.15em 0.1em">[[User:Yodin|Yodin]]</span><span style="text-shadow:grey 0.25em 0.25em 0.12em"><sup>[[User talk:Yodin|T]]</sup></span> 15:46, 21 October 2015 (UTC)
I tried to add a link to <time-to-change.org.uk>, and was told that I couldn't add the link, as <change.org> was blacklisted. Is this partial-match blacklisting (based, I guess, on an incorrect interpretation of URL specifications) a known bug? Cheers. --<span style="text-shadow:grey 0.15em 0.15em 0.1em">[[User:Yodin|Yodin]]</span><span style="text-shadow:grey 0.25em 0.25em 0.12em"><sup>[[User talk:Yodin|T]]</sup></span> 15:46, 21 October 2015 (UTC)
:This is more of a limitation to the regex, we tend to blacklist '\bchange\.org\b', but a '-' is also a 'word-end' (the \b). I'll see if I can adapt the rule. --[[User:Beetstra|Dirk Beetstra]] <sup>[[User_Talk:Beetstra|<span style="color:#0000FF;">T</span>]] [[Special:Contributions/Beetstra|<span style="color:#0000FF;">C</span>]]</sup> (en: [[:en:User:Beetstra|U]], [[:en:User talk:Beetstra|T]]) 07:46, 22 October 2015 (UTC)


==Discussion==
==Discussion==

Revision as of 07:46, 22 October 2015

Shortcut:
WM:SPAM
WM:SBL
The associated page is used by the MediaWiki Spam Blacklist extension, and lists regular expressions which cannot be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any meta administrator can edit the spam blacklist; either manually or with SBHandler. For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.
Proposed additions
Please provide evidence of spamming on several wikis and prior blacklisting on at least one. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Please check back after submitting your report, there could be questions regarding your request.
Proposed removals
Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.
Other discussion
Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
#wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.
Whitelists There is no global whitelist, so if you are seeking a whitelisting of a url at a wiki then please address such matters via use of the respective Mediawiki talk:Spam-whitelist page at that wiki, and you should consider the use of the template {{edit protected}} or its local equivalent to get attention to your edit.

Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.


Completed requests are marked as {{added}}/{{removed}} or {{declined}}, and are generally archived quickly. Additions and removals are logged · current log 2024/05.

snippet for logging
{{sbl-log|14247608#{{subst:anchorencode:SectionNameHere}}}}


Proposed additions

This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived.

brths.pt



URL redirect; blocked on sight AFAIK. It Is Me Here t / c 19:26, 20 September 2015 (UTC)Reply

edcoatescollection.com



Link is located on the w:SURBL list. Take note: SURBL Blacklist lookup and WOT Scorepage. Anarchyte 10:28, 7 July 2015 (UTC)Reply

Copied form enWP blacklist requests: site is linked from multiple projects. JzG (talk) 12:32, 16 July 2015 (UTC)Reply
I don't quite understand. At enwiki there are 80 instance of domain linking, 17 at dewiki, 11 at frwiki and 15 at commons [1]. If it is an undesired domain, why is it still linked on the wiki from this request comes from and other major projects? —MarcoAurelio 11:04, 9 August 2015 (UTC)Reply
@MarcoAurelio: - cleanup does not need to run in par with blacklisting, if a site is spammed or damaging to the reader, it is sometimes prudent to blacklist first, and followed by cleanup (and sometimes the spamming/abuse keeps up with the removal efforts). Also, sometimes besides being spammed there are some good instances on a Wiki of specific documents or the official link on the subject page - that then needs to be handled with whitelisting. (no specific judgement on the site itself). --Dirk Beetstra T C (en: U, T) 03:33, 26 August 2015 (UTC)Reply

truecrimebookreviews.com



This is an attack site with malware embedded. I just removed about 23 links to it on en.wiki.
⋙–Berean–Hunter—► ((⊕)) 00:51, 21 October 2015 (UTC)Reply

Proposed additions (Bot reported)

This section is for domains which have been added to multiple wikis as observed by a bot.

These are automated reports, please check the records and the link thoroughly, it may report good links! For some more info, see Spam blacklist/Help#COIBot_reports. Reports will automatically be archived by the bot when they get stale (less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot).

Sysops
  • If the report contains links to less than 5 wikis, then only add it when it is really spam
  • Otherwise just revert the link-additions, and close the report; closed reports will be reopened when spamming continues
  • To close a report, change the LinkStatus template to closed ({{LinkStatus|closed}})
  • Please place any notes in the discussion section below the HTML comment

COIBot

The LinkWatchers report domains meeting the following criteria:

  • When a user mainly adds this link, and the link has not been used too much, and this user adds the link to more than 2 wikis
  • When a user mainly adds links on one server, and links on the server have not been used too much, and this user adds the links to more than 2 wikis
  • If ALL links are added by IPs, and the link is added to more than 1 wiki
  • If a small range of IPs have a preference for this link (but it may also have been added by other users), and the link is added to more than 1 wiki.
COIBot's currently open XWiki reports
List Last update By Site IP R Last user Last link addition User Link User - Link User - Link - Wikis Link - Wikis
vrsystems.ru 2023-06-27 15:51:16 COIBot 195.24.68.17 192.36.57.94
193.46.56.178
194.71.126.227
93.99.104.93
2070-01-01 05:00:00 4 4

Proposed removals

This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.

Remember to provide the specific domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as {{removed}} or {{declined}} and archived.

See also /recurring requests for repeatedly proposed (and refused) removals.

Notes:

  • The addition or removal of a domain from the blacklist is not a vote; please do not bold the first words in statements.
  • This page is for the removal of domains from the global blacklist, not for removal of domains from the blacklists of individual wikis. For those requests please take your discussion to the pertinent wiki, where such requests would be made at Mediawiki talk:Spam-blacklist at that wiki. Search spamlists — remember to enter any relevant language code

syriadirect.org



This site offers good independent information about the Syrian civil war and it doesn't contain spam or anything like that. Honestly, I don't even know why it was put on the list in the first place but I would like to see it on the green so people can use it as reference in their articles about this conflict.

Seems to be collateral damage for a regex response to spam. We can probably do a lookbehind regex fix for this.  — billinghurst sDrewth 09:01, 18 December 2014 (UTC)Reply

==



Sir 3 Months Ago My website has been blacklist. Sir I dont Know much about wiki policies that is why I made a mistake.Now I know wiki polices.Sir kindly help me to remove my site from blacklist.Sir My site names is Bergspider.net — The preceding unsigned comment was added by Thomasfan11‎ (talk)

 nothing to do, not globally blacklisted. This was only blacklisted at English Wikipedia, you will need to take your request to w:Mediawiki talk:Spam-blacklist and ask there.  — billinghurst sDrewth 13:08, 10 August 2015 (UTC)Reply


fluoridealert.org



This site offers good independent information about the fluorine and fluoride additives to water supplies. It was blacklisted 5 years ago and it was not logged properly. I do not have any conflict of interest. It does not contain spam. It supplies unbiased references and primary sources for information not otherwise obtainable by English speaking countries. Please remove from blacklist, thank you.— The preceding unsigned comment was added by RibsNY (talk)

Not here, Not done. Please see the local blacklist on the wiki where you are coming from. --Dirk Beetstra T C (en: U, T) 10:01, 16 August 2015 (UTC)Reply
 Declined  — billinghurst sDrewth 05:14, 8 September 2015 (UTC)Reply

aviatorsale.com



Wanted to use a PDF on this site to reference the number of cycles in de:Entführung des Flugzeugs „Landshut“. To my understanding, the site has been blacklisted because six years ago somebody was spamming pictures from that site which most likely won't happen again anytime soon. --Studmult (talk) 17:49, 16 August 2015 (UTC)Reply

 Declined 'a PDF' - if it is only one page then that is something that needs to be resolved using local whitelisting. Dirk Beetstra T C (en: U, T) 03:29, 26 August 2015 (UTC)Reply

dot.tk



Registry for the .tk TLD. Linked to on its Wikipedia page. Undoubtedly, the nature of the .tk TLD causes it to be used primarily for nefarious purposes, but the TLD's registry should probably not be included in this list. 198.200.98.26 09:42, 25 August 2015 (UTC)Reply

Removed Removed in a fashion. I have tightened to domain.dot.tk.  — billinghurst sDrewth 14:11, 25 August 2015 (UTC)Reply

squidoo.com



We acquired the Squidoo.com domain over a year ago and it's no longer hosting any content. We have aggressively cleaned up remnants such as this and would appreciate it being removed from the blacklist. We will ensure that activities with this domain aren't associated with spamming in the future. I work for the corporation that now owns the domain. — The preceding unsigned comment was added by Pauledmondson (talk)

It turns out the new owner is... Hubpages, which is blacklisted on en.wikipedia for exactly the same reason Squidoo is blacklisted here. MER-C (talk) 02:47, 26 August 2015 (UTC)Reply
@MER-C: while that is the case, is there an impediment to removal of the domain, and putting it onto COIBot's monitor list? That way we can re-add pretty quickly if it becomes a problem  — billinghurst sDrewth 03:14, 26 August 2015 (UTC)Reply
@MER-C and SDrewth: .. while wasting volunteers time and leading them into frustration for having to remove this stuff? This was blacklisted for a reason, I would say that a criterion must therefore be that the focus of the site will change drastically, not just that the site is cleaned up. This is not spam from the company side of the spectrum, it is spam for many editors abusing it's content.
@Pauledmondson: - is the focus of the site now completely different from the old squidoo.com, and completely different from hubpages.com? --Dirk Beetstra T C (en: U, T) 03:26, 26 August 2015 (UTC)Reply

We have no plans to host content on Squidoo.com at this time. Also, if you're not familiar with the evolution of HubPages, there is now a pretty extensive editorial oversite (it's no longer like youtube) and the overall site is getting fact checked and edited by a team of in-house professional editors. We wouldn't make the request for Squidoo if we felt there was a risk that spammy content could return to it. The last thing we want is a return to the spam list. It's been over a year since we acquired the domain. We waited to submit this request until all the content was either unpublished or moved. I can certainly appreciate that the mods at wikipedia don't want to deal with spam. We have a considerable amount of our resources dedicated to spam fighting as well and it's frustrating to deal with spam day in and out. I can also appreciate the necessity of blacklists for how challenging it is to keep a large site like wikipedia that is constantly under attack spam free - I think we all strive to do as well as wikipedia. Thanks again for the consideration. (I apologize if I didn't follow the correct syntax for replying)

Removed Removed AGF  — billinghurst sDrewth 05:21, 8 September 2015 (UTC)Reply
@Pauledmondson: Hmm, already the domain is being used to spam, as while there is no links left, the spammers don't know, or don't care. I am going to flick up the monitoring and see what may be the alternatives to manage outside of the blacklist.  — billinghurst sDrewth 15:01, 9 September 2015 (UTC)Reply

lulu.com



This is just a book publishing website. I don't see why I can't link to a book from Lulu. --WandaRichards 14:17, 27 August 2015 (UTC)Reply

 Declined @WandaRichards: It is not globally blacklisted, though I note that it is locally blacklisted at English Wikipedia. You are going to need to take your query to w:en:Mediawiki talk:Spam-blacklist  — billinghurst sDrewth 05:24, 8 September 2015 (UTC)Reply

check-and-secure.com





We are the owner of the webpages check-and-secure.com and we can't reference to it, due to a regex-entry "d-secure.com" in wikimedia's meta-blacklist. Please add our domain to a whitelist or rewrite the regex-pattern to something less impactive.— The preceding unsigned comment was added by 87.155.46.93 (talk)

Done addition  — billinghurst sDrewth 05:11, 8 September 2015 (UTC)Reply
@Billinghurst: - maybe the addition should be '(?!-an)' instead of \b .. IIRC, there was a reason why there was not a \b there. --Dirk Beetstra T C (en: U, T) 05:54, 8 September 2015 (UTC)Reply
hmm .. maybe not, I see that there were many additions done at the same time without the initial \b .. maybe we should just throw off that whole list. --Dirk Beetstra T C (en: U, T) 06:01, 8 September 2015 (UTC)Reply
It was a low-doc addition, so I went on the plain vanilla issue resolution (mixed metaphors?), and will follow that process if it recurs. I didn't see the need to be more or less responsive though have complete trust in any of your actions.  — billinghurst sDrewth 06:45, 8 September 2015 (UTC)Reply

uservoice.com



uservoice.com is blacklisted as a URL shortener, however it is not a URL shortener, but a feedback site for manny software companies products, specifically Microsoft. I believe it should be removed from the blacklist as there is no point to this. Wolf GuySB (talk) 15:45, 5 September 2015 (UTC)Reply

Removed Removed though to note that if it is abused, or not used appropriately to the sites as a credible source then it may reappear.  — billinghurst sDrewth 05:26, 8 September 2015 (UTC)Reply

infosecinstitute.com



I've found myself wanting to use this a couple of times but it's filtered, I can't see any reason for this. Deku-shrub (talk) 21:28, 5 September 2015 (UTC)Reply

 Declined, not here, please check local blacklisting on your wiki. --AldNonymousBicara? 22:00, 5 September 2015 (UTC)Reply
Thanks Deku-shrub (talk) 09:03, 6 September 2015 (UTC)Reply

panicattackaway.com



This site contain helpful information for people who need help with stress,Panic attacks and anxiety, and it has any thing spammy on it—The preceding unsigned comment was added by 196.205.207.86 (talk)

 Declined, {{not here}], but on en.wikipedia per this. Dirk Beetstra T C (en: U, T) 16:47, 28 September 2015 (UTC)Reply

letstalkpayments.com



The website is doing indepth research on blockchain technology and as a fintech lover, I wanted to add some insights from this website into the blockchain article. This information about non financial use cases of blockchain is unique to this website. However, whenever I try to cite this link, it is blacklisted. Can it be removed?— The preceding unsigned comment was added by 2601:646:1:b3f1:f15f:246f:d986:f35e (talk)

 Declined, Template:Not here but on en.wikipedia. Dirk Beetstra T C (en: U, T) 16:48, 28 September 2015 (UTC)Reply

trevorcook.typepad.com - The Magazine of Jackson Wells Morris



I don't quite understand why the link for The Wells: Magazine of Jackson Wells Morris, is being blocked/blacklisted. I need to use it for reference as this is one of the good sources that i have besides, the newsletter from trevorcook.typepad.com.—The preceding unsigned comment was added by Shazrinarmk2015 (talk)

Hmm, is this even blocked? http://trevorcook.typepad.com. --Dirk Beetstra T C (en: U, T) 08:46, 6 October 2015 (UTC)Reply
Indeed, does not seem blocked. --Dirk Beetstra T C (en: U, T) 08:47, 6 October 2015 (UTC)Reply

cosplay.de



cosplay.de and cosplay.com is on the spam blacklist. It seems this is because of someone actually spamming this link. Sadly this causes dcm-cosplay.de to get blocked. dcm-cosplay.de is the website of the german cosplay championship. Can cosplay.de get unblocked or dcm-cosplay.de whitelisted? --Y-93 (talk) 21:30, 20 October 2015 (UTC)Reply

Troubleshooting and problems

This section is for comments related to problems with the blacklist (such as incorrect syntax or entries not being blocked), or problems saving a page because of a blacklisted link. This is not the section to request that an entry be unlisted (see Proposed removals above).

derefer.unbubble.eu deblock





This authority is used 24.923 times in main space in dewiki!. It is used to clean up Special:Linksearch from known dead links, by redirecting them over this authority. It is hard to find a better solution for this task. --Boshomi (talk) 16:38, 24 July 2015 (UTC) Ping:User:BillinghurstBoshomi (talk) 16:49, 24 July 2015 (UTC)Reply

Please notice Phab:T89586, while not fixed, it is not possible to find the links with standard special:LinkSearch. in dewiki we can use giftbot/Weblinksuche instead.--Boshomi (talk) 18:04, 24 July 2015 (UTC)Reply
afaics derefer.unbubble.eu could be used to circumvent the SBL, is that correct? -- seth (talk) 21:30, 24 July 2015 (UTC)Reply
I don't think so, the redircted URL is unchanged, so the SBL works like the achive-URLs to the Internet Archive. --Boshomi (talk) 07:44, 25 July 2015 (UTC)Reply
It is not a stored/archived page at archive.org, it is a redirect service as clearly stated at the URL and in that it obfuscates links. To describe it in any other way misrepresents the case, whether deWP uses it for good or not. We prevent abuseable redirects from other services due to the potential for abuse. You can consider whitelisting the URL in w:de:MediaWiki:spam-whitelist if it is a specific issue for your wiki.  — billinghurst sDrewth 10:09, 25 July 2015 (UTC)Reply
what I want to say was that the SBL-mechanism works in the same way like web.archive.org/web. A blocked URL will be blocked with unbubble-prefix to the blocked URL.--Boshomi (talk) 12:54, 25 July 2015 (UTC)Reply

Unblocking YouTube's redirection and nocookie domains





Apparently youtu(dot)be and youtube-nocookie(dot)com, both of which are official YouTube domains owned by Google, are on this blacklist. For over ten years, the SpamBlacklist MediaWiki extension has loaded this blacklist on third-party wikis, big and small. This is quite an issue for third-party sites such as ShoutWiki, a wiki farm, since SpamBlacklist doesn't currently have the concept of "shared" whitelists — blacklists can be shared (loaded from a remote wiki), whitelists cannot. Given that the main YouTube domain isn't blocked, and also that YouTube itself hands out youtu(dot)be links, I don't think that "but it's a redirecting service" is a valid argument against it, and therefore I'd like to propose removing these two entries from the blacklist. --Jack Phoenix (Contact) 23:17, 29 August 2015 (UTC)Reply

There are several links on youtube blacklisted here on Meta, as well many, many on local wikis. Youtube has videos that get spammed, there are videos that should simply not be linked to. Leaving open the redirects then makes the issue that not only the youtube.com link needs to be blacklisted, but also all redirect to those links. That gives either extra work to the blacklisting editors, or leaves the easy back-door open. On wikis it leaves more material to check. That in combination with that redirect services are simply never needed, there is an alternative. Additionally, Wikipedia has their built-in redirect service which also works (I mean templates, like {{youtube}}).
That there is no meta-analogue of the whitelist is a good argument to push that request of years ago to re-vamp the spam-blacklist system through and have the developers focus on features that the community wants, and certainly not an argument for me to consider not to blacklist something. Moreover, I do not think that the argument that it hampers third-party wikis is an argument either - they choose to use this blacklist, they could alternatively set up their own 'meta blacklist' that they use, copy-pasting this blacklist and removing what they do not want/need.
The problem exists internally as well, certain of our Wikifarms do allow for certain spam, which is however inappropriate on the rest of the wikifarms, and on the majority by far (in wiki-volume) of the wikis. That also needs a rewriting of the spam-blacklist system, which is crude, too difficult. A light-weight edit-filter variety, specialised on this would be way more suitable. --Dirk Beetstra T C (en: U, T) 04:05, 30 August 2015 (UTC)Reply

Partial matches: <change.org> blocks <time-to-change.org.uk>





I tried to add a link to <time-to-change.org.uk>, and was told that I couldn't add the link, as <change.org> was blacklisted. Is this partial-match blacklisting (based, I guess, on an incorrect interpretation of URL specifications) a known bug? Cheers. --YodinT 15:46, 21 October 2015 (UTC)Reply

This is more of a limitation to the regex, we tend to blacklist '\bchange\.org\b', but a '-' is also a 'word-end' (the \b). I'll see if I can adapt the rule. --Dirk Beetstra T C (en: U, T) 07:46, 22 October 2015 (UTC)Reply

Discussion

This section is for discussion of Spam blacklist issues among other users.

Expert maintenance

One (soon) archived and rejected removal suggestion was about jxlalk.com matched by a filter intended to block xlalk.com. One user suggested that this side-effect might be as it should be, another user suggested that regular expressions are unable to distinguish these cases, and nobody has a clue when and why xlalk.com was blocked. I suggest to find an expert maintainer for this list, and to remove all blocks older than 2010. The bots identifying abuse will restore still needed ancient blocks soon enough, hopefully without any oogle matching google cases. –Be..anyone (talk) 00:50, 20 January 2015 (UTC)Reply

No, removing some of the old rules, before 2010 or even before 2007, will result in further abuse, some of the rules are intentionally wide as to stop a wide range of spamming behaviour, and as I have argued as well, I have 2 cases on my en.wikipedia list where companies have been spamming for over 7 years, have some of their domains blacklisted, and are still actively spamming related domains. Every single removal should be considered on a case-by-case basis. --Dirk Beetstra T C (en: U, T) 03:42, 20 January 2015 (UTC)Reply
Just to give an example to this - redirect sites have been, and are, actively abused to circumvent the blacklist. Some of those were added before the arbitrary date of 2010. We are not going to remove those under the blanket of 'having been added before 2010', they will stay blacklisted. Some other domains are of similar gravity that they should never be removed. How are you, reasonably, going to filter out the rules that never should be removed. --Dirk Beetstra T C (en: U, T) 03:52, 20 January 2015 (UTC)Reply
By the way, you say ".. intended to block xlalk.com .." .. how do you know? --Dirk Beetstra T C (en: U, T) 03:46, 20 January 2015 (UTC)Reply
I know that nobody would block icrosoft.com if what they mean is microsoft.com, or vice versa. It's no shame to have no clue about regular expressions, a deficit we apparently share.:tongue:Be..anyone (talk) 06:14, 20 January 2015 (UTC)Reply
I am not sure what you are referring to - I am not native in regex, but proficient enough. The rule was added to block, at least, xlale.com and xlalu.com (if it were ONLY these two, \bxlal(u|e)\.com\b or \bxlal[ue]\.com\b would have been sufficient, but it is impossible to find this far back what all was spammed, possibly xlali.com, xlalabc.com and abcxlale.com were abused by these proxy-spammers. --Dirk Beetstra T C (en: U, T) 08:50, 20 January 2015 (UTC)Reply
xlalk.com may have been one of the cases, but one rule that was blacklisted before this blanket was imposed was 'xlale.com' (xlale.com rule was removed in a cleanout-session, after the blanket was added). --Dirk Beetstra T C (en: U, T) 04:45, 20 January 2015 (UTC)Reply
The dots in administrative domains and DNS mean something, notably foo.bar.example is typically related to an administrative bar.example domain (ignoring well-known exceptions like co.uk etc., Mozilla+SURBL have lists for this), while foobar.example has nothing to do with bar.example. –Be..anyone (talk) 06:23, 20 January 2015 (UTC)Reply
I know, but I am not sure how this relates to this suggested cleanup. --Dirk Beetstra T C (en: U, T) 08:50, 20 January 2015 (UTC)Reply
If your suggested clean-ups at some point don't match jxlalk.com the request by a Chinese user would be satisfied—as noted all I found out is a VirusTotal "clean", it could be still a spam site if it ever was a spam site.
The regexp could begin with "optionally any string ending with a dot" or similar before xlalk. There are "host name" RFCs (LDH: letter digit hyphen) up to IDNAbis (i18n domains), they might contain recipes. –Be..anyone (talk) 16:56, 20 January 2015 (UTC)Reply
What suggested cleanups? I am not suggesting any cleanup or blanket removal of old rules. --Dirk Beetstra T C (en: U, T) 03:50, 21 January 2015 (UTC)Reply
Of course I'm not sure. There is no issue of bad faith. He had reason to use regex, for two sites, and possibly suspected additional minor changes would be made. But he only cited two sites. One of the pages was deleted, and has IP evidence on it, apparently, which might lead to other evidence from other pages, including cross-wiki. But the blacklistings themselves were clearly based on enwiki spam and nothing else was mentioned. This blacklist was the enwiki blacklist at that time. After enwiki got its own blacklist, the admin who blacklisted here attempted to remove all his listings. This is really old and likely obsolete stuff. --Abd (talk) 20:07, 21 January 2015 (UTC)Reply
3 at least. And we do not have to present a full case for blacklisting (we often don't, per en:WP:BEANS and sometimes privacy concerns), we have to show sufficient abuse that needs to be stopped. And if that deleted page was mentioned, then certainly there was reason to believe that there were cross-wiki concerns.
Obsolete, how do you know? Did you go through the cross-wiki logs of what was attempted to be spammed? Do you know how often some of the people active here are still blacklisting spambots using open proxies? Please stop with these sweeping statements until you have fully searched for all evidence. 'After enwiki got its own blacklist, the admin who blacklisted here attempted to remove all his listings.' - no, that was not what happened. --Dirk Beetstra T C (en: U, T) 03:16, 22 January 2015 (UTC)Reply
Hi!
I searched all the logs (Special:Log/spamblacklist) of several wikis using the regexp entry /xlal[0-9a-z-]*\.com/.
There were almost no hits:
w:ca: 0
w:ceb: 0
w:de: 0
w:en: 1: 20131030185954, xlalliance.com
w:es: 1: 20140917232510, xlalibre.com
w:fr: 0
w:it: 0
w:ja: 0
w:nl: 0
w:no: 0
w:pl: 0
w:pt: 0
w:ru: 0
w:sv: 0
w:uk: 0
w:vi: 0
w:war: 0
w:zh: 1: 20150107083744, www.jxlalk.com
So there was just one single hit at w:en (not even in the main namespace, but in the user namespace), one in w:es, and one in w:zh (probably a false positive). So I agree with user:Abd that removing of this entry from the sbl would be the best solution. -- seth (talk) 18:47, 21 February 2015 (UTC)Reply
Finally an argument based on evidence (these logs should be public, not admin-only - can we have something like this in a search-engine, this may come in handy in some cases!). Consider removed. --Dirk Beetstra T C (en: U, T) 06:59, 22 February 2015 (UTC)Reply
By the way, Seth, this is actually no hits - all three you show here are collateral. Thanks for this evidence, this information would be useful on more occasions to make an informed decision (also, vide infra). --Dirk Beetstra T C (en: U, T) 07:25, 22 February 2015 (UTC)Reply
I am not sure that we want the Special page to be public, though I can see some value in being able to have something at ToolLabs to be available to run queries, or something available to be run through quarry.  — billinghurst sDrewth 10:57, 22 February 2015 (UTC)Reply
Why not public? There is no reason to hide this, this is not BLP or COPYVIO sensitive information in 99.99% of the hits. The chance that this is non-public information is just as big as for certain blocks to be BLP violations (and those are visible) ... --Dirk Beetstra T C (en: U, T) 04:40, 23 February 2015 (UTC)Reply

Now restarting the original debate

As the blacklist is long, and likely contains rules that are too wide a net and which are so old that they are utterly obsolete (or even, may be giving collateral damage on a regular basis), can we see whether we can set up some criteria (that can be 'bot tested'):

  1. Rule added > 5 years ago.
  2. All hits (determined on a significant number of wikis), over the last 2 years (for now: since the beginning of the log = ~1.5 years) are collateral damage - NO real hits.
  3. Site is not a redirect site (should not be removed, even if not abused), is not a known phishing/malware site (to protect others), or a true copyright violating site. (this is hard to bot-test, we may need s.o. to look over the list, take out the obvious ones).

We can make some mistakes on old rules if they are not abused (remove some that actually fail #3) - if they become a nuisance/problem again, we will see them again, and they can be speedily re-added .. thoughts? --Dirk Beetstra T C (en: U, T) 07:25, 22 February 2015 (UTC)Reply

@@Hoo man: you have worked on clean up before, some of your thoughts would be welcomed.  — billinghurst sDrewth 10:53, 22 February 2015 (UTC)Reply
Doing this kind of clean up is rather hard to automatize. What might be working better for starters could be removing rules that didn't match anything since we started logging hits. That would presumably cut down the whole blacklist considerably. After that we could re-evaluate the rest of the blacklist, maybe following the steps outlined above. - Hoo man (talk) 13:33, 22 February 2015 (UTC)Reply
Not hitting anything is dangerous .. there are likely some somewhat obscure redirect sites on it which may not have been attempted to be abused (though, also those could be re-added). But we could do test-runs easily - just save a cleaned up copy of the blacklist elsewhere, and diff them against the current list, and see what would get removed.
Man, I want this showing up in the RC-feeds, then LiWa3 could store them in the database (and follow redirects to show what people wanted to link to ..). --Dirk Beetstra T C (en: U, T) 03:30, 23 February 2015 (UTC)Reply
Hi!
I created a table of hits of blocked link additions. Maybe it's of use for the discussion: User:lustiger_seth/sbl_log_stats (1,8 MB wiki table).
I'd appreciate, if we deleted old entries. -- seth (talk) 22:12, 26 February 2015 (UTC)Reply
Hi, thank you for this, it gives a reasonable idea. Do you know if the rule-hits were all 'correct' (for those that do show that they were hit) or mainly/all false-positives (if they are false-positive hitting, we could based on this also decide to tighten the rule to avoid the false-positives). Rules with all-0 (can you include a 'total' score) would certainly be candidates for removal (though still determine first whether they are 'old' and/or are nono-sites before removal). I am also concerned that this is not including other wikifarms - some sites may be problematic on other wikifarms, or hitting a large number of smaller wikis (which have less control due to low admin numbers). --Dirk Beetstra T C (en: U, T) 03:36, 8 March 2015 (UTC)Reply
Hi!
We probably can't get information of false positives automatically. I added a 'sum' column.
Small wikis: If you give me a list of the relevant ones, I can create another list. -- seth (talk) 10:57, 8 March 2015 (UTC)Reply
Thanks for the sum-column. Regarding the false-positives, it would be nice to be able to quickly see what actually got blocked by a certain rule, I agree that that then needs a manual inspection, but the actual number of rules with zero hits on the intended stuff to be blocked is likely way bigger than what we see.
How would you define the relevant small wikis - that is depending on the link that was spammed? Probably the best is to parse all ~750 wiki's, make a list of rules with 0 hits, and a separate list of rules with <10 hits (and including there the links that were blocked), and exclude everything above that. Then these resulting rules should be filtered by those which were added >5 years ago. That narrows down the list for now, and after a check for obvious no-no links, those could almost be blanket-removed (just excluding the ones with real hits, the obvious redirect sites and others - which needs a manual check). --Dirk Beetstra T C (en: U, T) 06:59, 9 March 2015 (UTC)Reply
Hi!
At User:Lustiger_seth/sbl_log_stats/all_wikis_no_hits there's a list containing ~10k entries that never triggered the sbl during 2013-sep and 2015-feb anywhere (if my algorithm is correct).
If you want to get all entries older than 5 years, then it should be sufficent to use only the entries in that list until (and including) \bbudgetgardening\.co\.uk\b.
So we could delete ~5766 entries. What do think? Shall we give it a try? -- seth (talk) 17:06, 18 April 2015 (UTC)Reply
The question is, how many of those are still existing redirect sites etc. Checking 5800 is quite a job. On the other hand, with LiWa3/COIBot detecting - it is quite easy to re-add them. --Dirk Beetstra T C (en: U, T) 19:28, 21 April 2015 (UTC)Reply
According to the last few lines, I've removed 124kB of non-hitting entries now. I did not remove all of them, because some were url shorteners and I guess, that they are a special case, even if not used yet. -- seth (talk) 22:25, 16 September 2015 (UTC)Reply

Blacklisting spam URLs used in references

Looks like a site is using the "references" section as a spam farm. If a site is added to this list, can the blacklist block the spam site? Raysonho (talk) 17:45, 5 September 2015 (UTC)Reply

Yes they can.--AldNonymousBicara? 21:56, 5 September 2015 (UTC)Reply
Thanks, Aldnonymous! Raysonho (talk) 00:07, 6 September 2015 (UTC)Reply

url shorteners

Hi!
IMHO the url shorteners should be grouped in one section, because they are a special group of urls that need a special treatment. A url shortener should not be removed from sbl unless the domain is dead, even if it has not been used for spamming, right? -- seth (talk) 22:11, 28 September 2015 (UTC)Reply

That would be beneficial to have them in a section. Problem is, most of them are added by script, and are hence just put at the bottom. --Dirk Beetstra T C (en: U, T) 04:51, 4 October 2015 (UTC)Reply

Unreadable

Why is the list not alphabetical, so I can look up whether a certain site is listed and then also look up when it was added? --Corriebertus (talk) 08:55, 21 October 2015 (UTC)Reply