Talk:Spam blacklist/Archives/2015-01

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Proposed additions

Symbol comment vote.svg This section is for completed requests that a website be blacklisted

best-ghostwriter.com and several other websites

There is a "bigger" Problem I expected. An IP added in the German Wikipedia some links to several sides:











All of these sides seems to be a fraud offer, some of these sides has the same "Thanks of other costumers" (including the same spelling mistakes). The I saw, that the IP 93.72.148.237 (talk · contribs) contributed several of these links to other Wikis. All of the links to the German Wikipedia are useless, I think the other links to .en, .es, etc. too. Please check this. (sorry for my bad english) --Schraubenbürschchen (talk) 08:12, 29 December 2014 (UTC)

Added Added all appear to be inter-related and xwiki additions and subsequent reversions -- — billinghurst sDrewth 13:31, 1 January 2015 (UTC)

Spambot batch for today

Reporting here for tracking and blacklisting where appropiate. COIBot will generate the reports of the domains reported here. Format: domain/user/af_log.

{{linksummary|tumblr.com}} - maybe we don't want to blacklist this.



























Will update as long as I discover more. -- M\A 11:22, 19 January 2015 (UTC)

Added Added ---- M\A 11:24, 19 January 2015 (UTC)

cte.li



URL shortener. MER-C (talk) 12:35, 20 January 2015 (UTC)

Added Added — Revi 14:12, 20 January 2015 (UTC)

ci8.de



URL shortener. --Glaisher (talk) 03:44, 26 January 2015 (UTC)

Added Added --Glaisher (talk) 03:45, 26 January 2015 (UTC)

Proposed removals

Symbol comment vote.svg This section is for archiving proposals that a website be unlisted.

cfl-scrapbook.no-ip.org



It seems that the whole domain no-ip.org is blacklisted following the June 2013 discussion found here. Would it be possible to whitelist this site: cfl-scrapbook.no-ip.org ? I have been using it to create articles about Canadian football on the French wiki but cannot link to it in references or external links. The site is informative, having a lot of information and statistics about players. It does not seems to me that blacklisting it is justified. Thanks, Cortomaltais (talk) 16:23, 30 December 2014 (UTC)

Please ask your local sysop to whitelist that specific domains. — Revi 16:31, 30 December 2014 (UTC)
Just to add to the commentary. @Cortomaltais: There is no ability to globally whitelist sites, hence why we have encouraged you to get a local whitelisting.  — billinghurst sDrewth 13:26, 1 January 2015 (UTC)
Declined Declined nothing to do at this time  — billinghurst sDrewth 13:26, 1 January 2015 (UTC)

myretrotv.com



This is the valid domain for the en:Retro Television Network, and should be removed from the blacklist. Scoty6776 (talk)

Declined Declined It is blocked under a local rule at English Wikipedia, there is no global block. You will need to discuss this at the blocking site, especially as they already have some whitelisting involved for the domain.  — billinghurst sDrewth 13:23, 1 January 2015 (UTC)

cais-soas.com



I want to refer to the following link: www.cais-soas.com/News/2001/October2001/22-10.htm

This is for the page en:David_Neil_MacKenzie (the current link to the obituary is broken). I don't see the reason why this domain is blocked, as it seems to be an academic source of information. בוקי סריקי (talk) 09:04, 26 November 2014 (UTC)

It seems to have been blocked following a request from enWP. @Dominic: do you have an opinion about the domain now with time having passed?  — billinghurst sDrewth 11:24, 26 November 2014 (UTC)

This was blacklisted due to spamming in combination with copyright infringement reasons. The situation may have changed, but Talk:Spam_blacklist/Archives/2010-02#cais-soas.com <- this discussion from 2010 sums it up quite well. At that time, it was deemed of about the same quality as Wikipedia itself, it was not an academic source of information, their inclusion standards were far below what we would need for a reliable source (and that there was spamming involved in the additions only strengthens that conclusion).

Unless the situation on the site has drastically changed, I would leave it on the blacklist, and request whitelisting for the few really needed links, like this one. --Dirk Beetstra T C (en: U, T) 03:32, 27 November 2014 (UTC)

Two questions:

  1. Is this a copy of the original, or an independent report of the same info? If the former, is it properly attributed?
  2. Is a copy of the original available from one of the archiving sites?

Thanks. --Dirk Beetstra T C (en: U, T) 06:18, 27 November 2014 (UTC)

@בוקי סריקי: ^^^^  — billinghurst sDrewth 11:32, 29 December 2014 (UTC)

Answers to the questions: this looks like a copy of the original obituary, which is attributed to the Independence newspaper. The link to the Independence is broken - whether the article is available elsewhere, I don't know. בוקי סריקי (talk) 16:08, 13 January 2015 (UTC)

I guess that whitelisting is then the way to go, it seems indeed the only online copy that I see. --Dirk Beetstra T C (en: U, T) 07:10, 14 January 2015 (UTC)
Beetstra the block request is seven years old. I think that it is worth the risk to unblock, and add the domain to the monitor list. We can reimpose a block if needed, and whitelist at that point.  — billinghurst sDrewth 10:48, 15 January 2015 (UTC)
7 years is not a very long time - I have two companies on my list on en.wikipedia that are spamming for that time, despite blacklisted domains. It earns their money. The problem however here was spammy / persistent linking to copyright violations, and we should by definition not link to them. I see on en.wikipedia sockpuppet investigations of editors adding links to this site, and a lengthy ani-discussion regarding one of the socks. That is why I ask above, whether the situation has drastically changed since 2007. --Dirk Beetstra T C (en: U, T) 12:20, 15 January 2015 (UTC)
Declined Declined concerns still exist, still request whitelisting at site of interest.  — billinghurst sDrewth 11:25, 16 January 2015 (UTC)

kreuz.net



This site is offline since 2012 (see w:de:Kreuz.net), same with the old redir kreuz-net.info. So I'd like to remove both domains from our list. Any objections? -- seth (talk) 23:09, 26 January 2015 (UTC)

If they're dead and the cause of its addition is no longer active, then I'd not oppose its removal. Wating for others to comment. -- M\A 23:27, 26 January 2015 (UTC)
Removed Removed Dirk Beetstra T C (en: U, T) 03:32, 27 January 2015 (UTC)
Blacklisted in the pre-local-blacklist-time (it looked like local spam to me), spammed and deemed problematic. 8 Years is not necessarily long, but if the site is defunct there is no reason to keep it here. --Dirk Beetstra T C (en: U, T) 03:35, 27 January 2015 (UTC)
ok. we'll still keep a (local) eye on it, for the domain is still registrated to the same person as before. -- seth (talk) 08:04, 27 January 2015 (UTC)


Troubleshooting and problems

Symbol comment vote.svg This section is for archiving Troubleshooting and problems.

Discussion

Symbol comment vote.svg This section is for archiving Discussions.

Expert maintenance

One (soon) archived and rejected removal suggestion was about jxlalk.com matched by a filter intended to block xlalk.com. One user suggested that this side-effect might be as it should be, another user suggested that regular expressions are unable to distinguish these cases, and nobody has a clue when and why xlalk.com was blocked. I suggest to find an expert maintainer for this list, and to remove all blocks older than 2010. The bots identifying abuse will restore still needed ancient blocks soon enough, hopefully without any oogle matching google cases. –Be..anyone (talk) 00:50, 20 January 2015 (UTC)

No, removing some of the old rules, before 2010 or even before 2007, will result in further abuse, some of the rules are intentionally wide as to stop a wide range of spamming behaviour, and as I have argued as well, I have 2 cases on my en.wikipedia list where companies have been spamming for over 7 years, have some of their domains blacklisted, and are still actively spamming related domains. Every single removal should be considered on a case-by-case basis. --Dirk Beetstra T C (en: U, T) 03:42, 20 January 2015 (UTC)
Just to give an example to this - redirect sites have been, and are, actively abused to circumvent the blacklist. Some of those were added before the arbitrary date of 2010. We are not going to remove those under the blanket of 'having been added before 2010', they will stay blacklisted. Some other domains are of similar gravity that they should never be removed. How are you, reasonably, going to filter out the rules that never should be removed. --Dirk Beetstra T C (en: U, T) 03:52, 20 January 2015 (UTC)
By the way, you say ".. intended to block xlalk.com .." .. how do you know? --Dirk Beetstra T C (en: U, T) 03:46, 20 January 2015 (UTC)
I know that nobody would block icrosoft.com if what they mean is microsoft.com, or vice versa. It's no shame to have no clue about regular expressions, a deficit we apparently share.:tongue:Be..anyone (talk) 06:14, 20 January 2015 (UTC)
I am not sure what you are referring to - I am not native in regex, but proficient enough. The rule was added to block, at least, xlale.com and xlalu.com (if it were ONLY these two, \bxlal(u|e)\.com\b or \bxlal[ue]\.com\b would have been sufficient, but it is impossible to find this far back what all was spammed, possibly xlali.com, xlalabc.com and abcxlale.com were abused by these proxy-spammers. --Dirk Beetstra T C (en: U, T) 08:50, 20 January 2015 (UTC)
xlalk.com may have been one of the cases, but one rule that was blacklisted before this blanket was imposed was 'xlale.com' (xlale.com rule was removed in a cleanout-session, after the blanket was added). --Dirk Beetstra T C (en: U, T) 04:45, 20 January 2015 (UTC)
The dots in administrative domains and DNS mean something, notably foo.bar.example is typically related to an administrative bar.example domain (ignoring well-known exceptions like co.uk etc., Mozilla+SURBL have lists for this), while foobar.example has nothing to do with bar.example. –Be..anyone (talk) 06:23, 20 January 2015 (UTC)
I know, but I am not sure how this relates to this suggested cleanup. --Dirk Beetstra T C (en: U, T) 08:50, 20 January 2015 (UTC)
If your suggested clean-ups at some point don't match jxlalk.com the request by a Chinese user would be satisfied—as noted all I found out is a VirusTotal "clean", it could be still a spam site if it ever was a spam site.
The regexp could begin with "optionally any string ending with a dot" or similar before xlalk. There are "host name" RFCs (LDH: letter digit hyphen) up to IDNAbis (i18n domains), they might contain recipes. –Be..anyone (talk) 16:56, 20 January 2015 (UTC)
What suggested cleanups? I am not suggesting any cleanup or blanket removal of old rules. --Dirk Beetstra T C (en: U, T) 03:50, 21 January 2015 (UTC)
  • I have supported delisting above, having researched the history, posted at Talk:Spam_blacklist/About#Old_blacklisting_with_scanty_history. If it desired to keep xlale.com and xlalu.com on the blacklist (though it's useless at this point), the shotgun regex could be replaced with two listings, easy peasy. --Abd (talk) 01:42, 21 January 2015 (UTC)
    As I said earlier, are you sure that it is only xlale and xlalu, those were the two I found quickly, there may have been more, I do AGF that the admin who added the rule had reason to blanket it like this. --Dirk Beetstra T C (en: U, T) 03:50, 21 January 2015 (UTC)
Of course I'm not sure. There is no issue of bad faith. He had reason to use regex, for two sites, and possibly suspected additional minor changes would be made. But he only cited two sites. One of the pages was deleted, and has IP evidence on it, apparently, which might lead to other evidence from other pages, including cross-wiki. But the blacklistings themselves were clearly based on enwiki spam and nothing else was mentioned. This blacklist was the enwiki blacklist at that time. After enwiki got its own blacklist, the admin who blacklisted here attempted to remove all his listings. This is really old and likely obsolete stuff. --Abd (talk) 20:07, 21 January 2015 (UTC)
3 at least. And we do not have to present a full case for blacklisting (we often don't, per en:WP:BEANS and sometimes privacy concerns), we have to show sufficient abuse that needs to be stopped. And if that deleted page was mentioned, then certainly there was reason to believe that there were cross-wiki concerns.
Obsolete, how do you know? Did you go through the cross-wiki logs of what was attempted to be spammed? Do you know how often some of the people active here are still blacklisting spambots using open proxies? Please stop with these sweeping statements until you have fully searched for all evidence. 'After enwiki got its own blacklist, the admin who blacklisted here attempted to remove all his listings.' - no, that was not what happened. --Dirk Beetstra T C (en: U, T) 03:16, 22 January 2015 (UTC)
Hi!
I searched all the logs (Special:Log/spamblacklist) of several wikis using the regexp entry /xlal[0-9a-z-]*\.com/.
There were almost no hits:
w:ca: 0
w:ceb: 0
w:de: 0
w:en: 1: 20131030185954, xlalliance.com
w:es: 1: 20140917232510, xlalibre.com
w:fr: 0
w:it: 0
w:ja: 0
w:nl: 0
w:no: 0
w:pl: 0
w:pt: 0
w:ru: 0
w:sv: 0
w:uk: 0
w:vi: 0
w:war: 0
w:zh: 1: 20150107083744, www.jxlalk.com
So there was just one single hit at w:en (not even in the main namespace, but in the user namespace), one in w:es, and one in w:zh (probably a false positive). So I agree with user:Abd that removing of this entry from the sbl would be the best solution. -- seth (talk) 18:47, 21 February 2015 (UTC)
Finally an argument based on evidence (these logs should be public, not admin-only - can we have something like this in a search-engine, this may come in handy in some cases!). Consider removed. --Dirk Beetstra T C (en: U, T) 06:59, 22 February 2015 (UTC)
By the way, Seth, this is actually no hits - all three you show here are collateral. Thanks for this evidence, this information would be useful on more occasions to make an informed decision (also, vide infra). --Dirk Beetstra T C (en: U, T) 07:25, 22 February 2015 (UTC)
I am not sure that we want the Special page to be public, though I can see some value in being able to have something at ToolLabs to be available to run queries, or something available to be run through quarry.  — billinghurst sDrewth 10:57, 22 February 2015 (UTC)
Why not public? There is no reason to hide this, this is not BLP or COPYVIO sensitive information in 99.99% of the hits. The chance that this is non-public information is just as big as for certain blocks to be BLP violations (and those are visible) ... --Dirk Beetstra T C (en: U, T) 04:40, 23 February 2015 (UTC)

Now restarting the original debate

As the blacklist is long, and likely contains rules that are too wide a net and which are so old that they are utterly obsolete (or even, may be giving collateral damage on a regular basis), can we see whether we can set up some criteria (that can be 'bot tested'):

  1. Rule added > 5 years ago.
  2. All hits (determined on a significant number of wikis), over the last 2 years (for now: since the beginning of the log = ~1.5 years) are collateral damage - NO real hits.
  3. Site is not a redirect site (should not be removed, even if not abused), is not a known phishing/malware site (to protect others), or a true copyright violating site. (this is hard to bot-test, we may need s.o. to look over the list, take out the obvious ones).

We can make some mistakes on old rules if they are not abused (remove some that actually fail #3) - if they become a nuisance/problem again, we will see them again, and they can be speedily re-added .. thoughts? --Dirk Beetstra T C (en: U, T) 07:25, 22 February 2015 (UTC)

@@Hoo man: you have worked on clean up before, some of your thoughts would be welcomed.  — billinghurst sDrewth 10:53, 22 February 2015 (UTC)
Doing this kind of clean up is rather hard to automatize. What might be working better for starters could be removing rules that didn't match anything since we started logging hits. That would presumably cut down the whole blacklist considerably. After that we could re-evaluate the rest of the blacklist, maybe following the steps outlined above. - Hoo man (talk) 13:33, 22 February 2015 (UTC)
Not hitting anything is dangerous .. there are likely some somewhat obscure redirect sites on it which may not have been attempted to be abused (though, also those could be re-added). But we could do test-runs easily - just save a cleaned up copy of the blacklist elsewhere, and diff them against the current list, and see what would get removed.
Man, I want this showing up in the RC-feeds, then LiWa3 could store them in the database (and follow redirects to show what people wanted to link to ..). --Dirk Beetstra T C (en: U, T) 03:30, 23 February 2015 (UTC)
Hi!
I created a table of hits of blocked link additions. Maybe it's of use for the discussion: User:lustiger_seth/sbl_log_stats (1,8 MB wiki table).
I'd appreciate, if we deleted old entries. -- seth (talk) 22:12, 26 February 2015 (UTC)
Hi, thank you for this, it gives a reasonable idea. Do you know if the rule-hits were all 'correct' (for those that do show that they were hit) or mainly/all false-positives (if they are false-positive hitting, we could based on this also decide to tighten the rule to avoid the false-positives). Rules with all-0 (can you include a 'total' score) would certainly be candidates for removal (though still determine first whether they are 'old' and/or are nono-sites before removal). I am also concerned that this is not including other wikifarms - some sites may be problematic on other wikifarms, or hitting a large number of smaller wikis (which have less control due to low admin numbers). --Dirk Beetstra T C (en: U, T) 03:36, 8 March 2015 (UTC)
Hi!
We probably can't get information of false positives automatically. I added a 'sum' column.
Small wikis: If you give me a list of the relevant ones, I can create another list. -- seth (talk) 10:57, 8 March 2015 (UTC)
Thanks for the sum-column. Regarding the false-positives, it would be nice to be able to quickly see what actually got blocked by a certain rule, I agree that that then needs a manual inspection, but the actual number of rules with zero hits on the intended stuff to be blocked is likely way bigger than what we see.
How would you define the relevant small wikis - that is depending on the link that was spammed? Probably the best is to parse all ~750 wiki's, make a list of rules with 0 hits, and a separate list of rules with <10 hits (and including there the links that were blocked), and exclude everything above that. Then these resulting rules should be filtered by those which were added >5 years ago. That narrows down the list for now, and after a check for obvious no-no links, those could almost be blanket-removed (just excluding the ones with real hits, the obvious redirect sites and others - which needs a manual check). --Dirk Beetstra T C (en: U, T) 06:59, 9 March 2015 (UTC)
Hi!
At User:Lustiger_seth/sbl_log_stats/all_wikis_no_hits there's a list containing ~10k entries that never triggered the sbl during 2013-sep and 2015-feb anywhere (if my algorithm is correct).
If you want to get all entries older than 5 years, then it should be sufficent to use only the entries in that list until (and including) \bbudgetgardening\.co\.uk\b.
So we could delete ~5766 entries. What do think? Shall we give it a try? -- seth (talk) 17:06, 18 April 2015 (UTC)
The question is, how many of those are still existing redirect sites etc. Checking 5800 is quite a job. On the other hand, with LiWa3/COIBot detecting - it is quite easy to re-add them. --Dirk Beetstra T C (en: U, T) 19:28, 21 April 2015 (UTC)
According to the last few lines, I've removed 124kB of non-hitting entries now. I did not remove all of them, because some were url shorteners and I guess, that they are a special case, even if not used yet. -- seth (talk) 22:25, 16 September 2015 (UTC)

Blacklisting spam URLs used in references

Looks like a site is using the "references" section as a spam farm. If a site is added to this list, can the blacklist block the spam site? Raysonho (talk) 17:45, 5 September 2015 (UTC)

Yes they can.--AldNonymousBicara? 21:56, 5 September 2015 (UTC)
Thanks, Aldnonymous! Raysonho (talk) 00:07, 6 September 2015 (UTC)