Jump to content

Talk:Spam blacklist

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by RockMFR (talk | contribs) at 23:06, 3 January 2010 (→‎Proposed removals: +tumblr.com). It may differ significantly from the current version.

Latest comment: 14 years ago by RockMFR in topic Proposed removals
Shortcut:
WM:SPAM
WM:SBL
The associated page is used by the MediaWiki Spam Blacklist extension, and lists strings of text that may not be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any meta administrator can edit the spam blacklist. There is also a more aggressive way to block spamming through direct use of $wgSpamRegex. Only system administrators can make changes to $wgSpamRegex, and its use is to be avoided whenever possible. For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.
Proposed additions
Please provide evidence of spamming on several wikis. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Please check back after submitting your report, there could be questions regarding your request.
Proposed removals
Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.

Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.

Completed requests are marked as {{added}}/{{removed}} or {{declined}}, and are generally archived (search) quickly. Additions and removals are logged.

Other discussion
Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
#wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.

snippet for logging
{{sbl-log|1789445#{{subst:anchorencode:SectionNameHere}}}}

Proposed additions

This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived.

co.cc again



MZMcBride removed this entry. Wikipedia claims that the domain is not a real TLD and is used for URL redirectors. On that basis, I think it should be re-added with a preceeding dot: \.co\.cc\b  — Mike.lifeguard | @en.wb 19:43, 11 November 2009 (UTC)Reply

Having had to deal with a lot of the spam links, I strongly endorse restoring this link. No offense, MZM, but this one should have been discussed first before removal. --Ckatz 10:05, 13 November 2009 (UTC)Reply
It's clearly not just being used for URL redirection. As the Wikipedia article notes, it can be used as a real DNS. .com is capable of URL redirection and brings in a lot more spam. I don't think blacklisting an entire TLD or ccTLD (real or not) is a good idea, though I can understand it's the simplest solution. Is there a complementary global whitelist? Do we have any idea how many false positives this addition to the blacklist will cause? --MZMcBride 10:42, 13 November 2009 (UTC)Reply
Do you have any evidence supporting your reason for removal? Discussion when working with others is critical, and your flippant response to my query is worrying.
As to the substantive issue: Yes, there will be candidates for whitelisting, that was acknowledged and addressed from the initial request for blacklisting. I haven't seen that the rate is unacceptable, which you simply take as a premise, and we have helped users to request whitelisting where necessary, and will continue to do so.  — Mike.lifeguard | @en.wb 14:56, 13 November 2009 (UTC)Reply
Flippant? You've globally blacklisted an entire ccTLD, which has broad implications on 700+ projects, plus an unknown number of sites that also use this list. This entry in particular is creating an unknown (and possibly high) number of false positives (I'm only here because there was a local problem at en.wiki regarding what appears to be an entirely valid URL and it was baffling how the URL could be blacklisted). Here's the diff of you broadening the regex—where was the discussion for doing this? I don't see anything in the log, though admittedly the log is nearly impossible to navigate. (If there is no discussion, what was the rationale? Is there supporting data to suggest that the only possible approach here is to block the entire ccTLD, an obviously extreme tactic?) --MZMcBride 16:33, 13 November 2009 (UTC)Reply
I think you missed this.  — Mike.lifeguard | @en.wb 16:36, 13 November 2009 (UTC)Reply
Are discussions on this talk page archived anywhere? I checked the log (silly me, I know). Reading the old discussion, I'm still baffled about the rationale here. It can be used for URL redirection. So can literally any other domain (top-level or otherwise). That's not an argument to ban any and all uses of it. If there's evidence that this domain is unmanageable and won't result in an excessive number of false positives, I don't have an issue with including such a broad regex. But I'd like there to be some specific data to point to, not just "can be used for URL redirection," which I consider a non-argument. --MZMcBride 16:42, 13 November 2009 (UTC)Reply
Not "can" -- "is" (well, "was" until you removed it :D). You can see User:COIBot/XWiki/co.cc for a small taste (too many results to generate the large taste) - or the original request. Anecdotally, yes, we know it was abused cross-wiki; that's why I added it when JzG brought the request here - if not it would have been "add to XLinkBot for enwiki, and we'll attempt to monitor on other wikis with COIBot.  — Mike.lifeguard | @en.wb 16:50, 13 November 2009 (UTC)Reply

(unindent) A question about User:COIBot/LinkReports/co.cc. How is the false positive ratio determined? It looks like the bot finds all instances of the domain (or part of a domain string) being added to a page, but are there are numbers regarding how many of these additions were legitimate? (There are legitimate uses of this ccTLD, right?) --MZMcBride 09:57, 15 November 2009 (UTC)Reply

I'm not sure what you mean by "false positive" in this context -- the bot cannot decide whether a link addition is appropriate or not since it's a bot.  — Mike.lifeguard | @en.wb 05:21, 2 December 2009 (UTC)Reply

urlpass.com



url shortener Track13 0_o 22:36, 3 January 2010 (UTC)Reply

Proposed additions (Bot reported)

This section is for domains which have been added to multiple wikis as observed by a bot.

These are automated reports, please check the records and the link thoroughly, it may report good links! For some more info, see Spam blacklist/Help#COIBot_reports. Reports will automatically be archived by the bot when they get stale (less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot).

Sysops
  • If the report contains links to less than 5 wikis, then only add it when it is really spam
  • Otherwise just revert the link-additions, and close the report; closed reports will be reopened when spamming continues
  • To close a report, change the LinkStatus template to closed ({{LinkStatus|closed}})
  • Please place any notes in the discussion section below the HTML comment

The LinkWatchers report domains meeting the following criteria:

  • When a user mainly adds this link, and the link has not been used too much, and this user adds the link to more than 2 wikis
  • When a user mainly adds links on one server, and links on the server have not been used too much, and this user adds the links to more than 2 wikis
  • If ALL links are added by IPs, and the link is added to more than 1 wiki
  • If a small range of IPs have a preference for this link (but it may also have been added by other users), and the link is added to more than 1 wiki.
COIBot's currently open XWiki reports
List Last update By Site IP R Last user Last link addition User Link User - Link User - Link - Wikis Link - Wikis
vrsystems.ru 2023-06-27 15:51:16 COIBot 195.24.68.17 192.36.57.94
193.46.56.178
194.71.126.227
93.99.104.93
2070-01-01 05:00:00 4 4

Proposed removals

This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.

Remember to provide the specific domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as {{removed}} or {{declined}} and archived.

See also /recurring requests for repeatedly proposed (and refused) removals.

The addition or removal of a domain from the blacklist is not a vote; please do not bold the first words in statements.

estaticos.20minutos.es



Please I do not know why the domain estaticos.20minutos.es is in the spam list. This domain belong to 20 minutos a newspaper from Spain which is published under free license, and links are specially used in catalan Wikinews. I do not understand which kind of abuse has been reported, but I think that it should be used in order to cite some sources of information as external links in catalan or spanish wikipedia. Can anyone explain it or solve this problem in order to increase verifiability of Wikipedia? Thank you. --Bestiasonica 10:50, 22 December 2009 (UTC)Reply

This was blacklisted per User:COIBot/XWiki/estaticos.20minutos.es. You can whitelist pages as needed for sourcing - though it seems most of the spamming was on eswiki. The site currently returns a 403, so I cannot evaluate this any further.  — Mike.lifeguard | @en.wb 21:34, 26 December 2009 (UTC)Reply
I am not an expert but I do not understand what is a 403 and I don not understand how whitelist the pages. Happy New Year! --Bestiasonica 23:27, 29 December 2009 (UTC)Reply
I'd say that a removal here would not hurt. As Bestiasonica says, it is a Spanish newspaper which publishes under free license and it is used as sourcing at Wikipedia and Wikinews. I think that we can remove it and keep watching the additions via the bot reports... Thoughts? — Dferg (disputatio) 10:05, 30 December 2009 (UTC)Reply
Agreed. This wasn't the only link that was added in the last reported edits and it seems it just happened to have a picture that someone wanted to spam. --Erwin 10:33, 30 December 2009 (UTC)Reply
Thanks for your comments, Erwin. Given that there is no opposition to the proposal I went ahead and Removed Removed it. It can be useful to our projects. Link additions will be monitored trough the bot report. Thanks, — Dferg (disputatio) 12:34, 2 January 2010 (UTC)Reply

aerobaticteams.net



Please check this site and ensure that it is not spam site. Many of the articles of Military Aerobatic Teams section at wikipedia uses the aerobaticteams.net like source. I don't understand why this site is blacklisted. — The preceding unsigned comment was added by 94.155.239.130 (talk) diff — Dferg (disputatio) 12:07, 25 December 2009 (UTC)Reply

That domain was added on this blacklist per this discussion back in October, 2008. The domain seems quite problematic too on enwiki because they have locally blacklisted too the same domain (see [1], [2], [3]). It is blacklisted too at ar.wikipedia. I'd suggest not to remove the domain right now here but to request local whitelisting on the relevant wikis you want to add external links to that domain, as I did on es.wikipedia. Comments welcome. Best regards, — Dferg (disputatio) 12:20, 25 December 2009 (UTC)Reply
Excessive multi-project abuse. Also see the discussion here
Mass multiple project spamming, abuse, Several declines, vandalism of reports related to aerobaticteams.net, and multiple attempts to circumvent blacklisting. Additionaly, the requesting IP above also was used in spamming this site.--Hu12 17:23, 31 December 2009 (UTC)Reply

Also, I recomend adding the following;







These are being used to circumvent the blacklisting. --Hu12 17:33, 31 December 2009 (UTC)Reply

Done & request for de-blacklisting  declined — Dferg (disputatio) 12:53, 2 January 2010 (UTC)Reply

globalflight.net



This is one of the most comprehensive sites for Frequent Flyer Programs and contains, among others, the only complete worldwide listing and deep links to all FFPs, plus other unique features, such as the "Who with whom?" application. We suspect having been blacklisted by a competitor.

Sorry, we're all human here.
This was blacklisted per User:COIBot/XWiki/globalflight.net. If there's utility for our projects in linking to this domain, I suspect whitelisting for those cases will be sufficient.  — Mike.lifeguard | @en.wb 05:53, 28 December 2009 (UTC)Reply
Agreed with Mike -  Declined — Dferg (disputatio) 12:38, 2 January 2010 (UTC)Reply

tumblr.com



This was added as a url shortener - as far as I know, it is not. It is a blogging/microblogging platform. RockMFR 23:06, 3 January 2010 (UTC)Reply

Troubleshooting and problems

This section is for comments related to problems with the blacklist (such as incorrect syntax or entries not being blocked), or problems saving a page because of a blacklisted link. This is not the section to request that an entry be unlisted (see Proposed removals above).

None currently

Discussion

This section is for discussion of Spam blacklist issues among other users.

large log page

Hi!
html-rendered Spam_blacklist/Log is very large and so it's a kind of boring to view that page waiting for the "edit"-buttons.
I suggest to

  1. move Spam blacklist/Log/Pre-2008 to Spam blacklist/Log/Archive (or Spam blacklist/Log/Old or something similar) and
  2. move the parts 2008 and 2009 of the present log Spam blacklist/Log to that archive.

Some of my scripts (and probably some scripts of some other users) refer to /Log/Pre-2008, so that page should remain a redirect. Any objections? -- seth 10:30, 3 January 2010 (UTC)Reply

Done — Mike.lifeguard | @en.wb 17:53, 3 January 2010 (UTC)Reply