Talk:Spam blacklist/Archives/2019-04

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Warning! Please do not post any new comments on this page. This is a discussion archive first created on 01 April 2019, although the comments contained were likely posted before and after this date. See current discussion or the archives index.

Proposed additions

Symbol comment vote.svg This section is for completed requests that a website be blacklisted

Wikibirthdays

(below copied from a request on en.wikipedia)










Wikibirthdays says it gets its content from Wikipedia, and I've just cleaned up a bunch of citation spam. This even includes adding Template:Diff2 and hitting the same article multiple times (Template:Diff2, Template:Diff2). Note that the first spam link was not reverted; the same website is spammed twice in the lead section. NinjaRobotPirate (talk) 10:43, 1 April 2019 (UTC)

@NinjaRobotPirate: Added Added to Spam blacklist. --Dirk Beetstra T C (en: U, T) 11:22, 1 April 2019 (UTC)

xurl.*

The blacklist contains shorteners xurl.es and xurl.gq. Because of a spamming attempt on Commons linking to .work I suggest replacing them by

  • \bxurl\.(?:es|gq|work)\b

--Achim (talk) 06:39, 6 April 2019 (UTC)

@Achim55: Added Added as .\w{2,4}\b
Billinghurst, that's a good idea, I just noticed that .org is affected too. Cheers, --Achim (talk) 13:36, 6 April 2019 (UTC)

Url shorteners

Hi, can you block those url shorteners :





















--Mattebng (talk) 06:07, 10 April 2019 (UTC)

X mark.svg Not done As a practice we only block abused domains.  — billinghurst sDrewth 10:33, 10 April 2019 (UTC)

External_links_policy#Global_blacklist "Some links are generally blacklisted on meta, even if the abuse has only been to one project, or when the link has not been used abusively yet: 1. URL-shorteners/redirect sites (like e.g. tinyurl) as these can be used to circumvent blacklisting of other domains, and it is totally unnecessary to use these (as one can link to the original document directly)." They are all easy to find on Google. --Mattebng (talk) 12:06, 10 April 2019 (UTC)

@Mattebng: Added Added to Spam blacklist. 2 were already abused, one of them throws warnings on my computer. There is with redirect sites really no need to wait until someone starts abusing them, they will be abused, except for the really big guys, the only reason to use them is to circumvent/obfuscate something. The blacklist is to prevent abuse, I really don't see any reason to wait for abuse before we have to prevent it. --Dirk Beetstra T C (en: U, T) 14:04, 10 April 2019 (UTC)

Url shorteners























Hi, can you block those url shorteners ? Thanks --Mattebng (talk) 10:00, 12 April 2019 (UTC)

@Mattebng: Added Added to Spam blacklist. --Dirk Beetstra T C (en: U, T) 09:57, 14 April 2019 (UTC)

anupmandalcom.wordpress.com



On EN in ranges 2405:204:E40D:C760:* , 2405:204:E380:A4F:*, 2405:204:E481:A5B8:* and 2405:204:E20B:C9E7:* over several days and most blocked Also found so far on..

  • MR 1
  • KO 1, 2, 3, 4
  • ZH 1, 2, 3, 4, 5
  • ES 1, 2
  • IT 1, 2
  • PT 1, 2
  • JA 1, 2, 3
  • FR 1, 2, 3 and lots more the other day I don't have notes for and someone else had reverted.

Cheers KylieTastic (talk) 21:52, 15 April 2019 (UTC)

@KylieTastic: Added Added to Spam blacklist. -- — billinghurst sDrewth 22:39, 15 April 2019 (UTC)

regex April 2019

  • Regex requested to be blacklisted: \bteleley\.com/foro
  • Regex requested to be blacklisted: \bbecomegorgeous\.com/topics
  • Regex requested to be blacklisted: \bzyngaplayerforums\.com/poker

All being used by spambots, and while domains should be allowed, these regex are blockable as not used by WPs, and part of spambot campaign.  — billinghurst sDrewth 21:52, 17 April 2019 (UTC)

Added Added to Spam blacklist. -- — billinghurst sDrewth 21:52, 17 April 2019 (UTC)

io.goldfash.com



redirect service  — billinghurst sDrewth 06:27, 19 April 2019 (UTC)

@Billinghurst: Added Added to Spam blacklist. -- — billinghurst sDrewth 06:27, 19 April 2019 (UTC)

cryptobrowser.site



Crypto browser site using some kind of referral scheme. Already blacklisted on fa-Wiki, but also various spam on de-, it-, fr- and en-Wiki (see detailed COI report). No encyclopedic usage. GermanJoe (talk) 01:31, 21 April 2019 (UTC)

@GermanJoe: Thx. Added Added to Spam blacklist. -- — billinghurst sDrewth 03:19, 21 April 2019 (UTC)

gawcialis.com



URL shortener found on spamming user page on Commons. --Achim (talk) 13:29, 28 April 2019 (UTC)

@Achim55: Added Added to Spam blacklist. -- — billinghurst sDrewth 21:05, 28 April 2019 (UTC)

Proposed removals

Symbol comment vote.svg This section is for archiving proposals that a website be unlisted.

best-exam.ru



The site best-exam.ru published a blacklist due to the addition of links (although the material on which was included to find some information on wikipedia.org) I ask you to exclude the site from the blacklist. Thank. —The preceding unsigned comment was added by 95.78.142.95 (talk)

Declined Declined I am not seeing why this would be used on wikipedia sites, so I am declining the request. You may wish to request its whitelisting at the wiki of interest if they believe that it is of value to them.  — billinghurst sDrewth 12:29, 7 April 2019 (UTC)

victor-mochere.com



The issued was as a result of a shared computer that initiated the persistent changes, that has been resolved and won't happen again. The site has information that is very useful. Its was about one link that was persistently being re-added. —The preceding unsigned comment was added by Kevinwanzira (talk)

Declined not globally blacklisted. I think that you will find that this has been actioned at enWP. Defer to w:en:Mediawiki talk:spam-blacklist  — billinghurst sDrewth 10:42, 10 April 2019 (UTC)

Troubleshooting and problems

Symbol comment vote.svg This section is for archiving Troubleshooting and problems.

Discussion

Symbol comment vote.svg This section is for archiving Discussions.

twitter.com



The blacklisted pattern \btwitter\.com/search\b blocks cleanup of archived pages as it is used on a lot of mass messages. Over two years I had about 500 hits. I believe the best would be to remove this entry from the blacklist. — Jeblad 14:28, 6 April 2019 (UTC)

@Jeblad: how can this be used in mass-messages if it is blacklisted? Are the very old messages? --Dirk Beetstra T C (en: U, T) 07:11, 7 April 2019 (UTC)
Can you whitelist it locally? Even temporarily for an archiving run. It was being horribly abused by the spambots when we added it.  — billinghurst sDrewth 07:15, 7 April 2019 (UTC)
This was archive runs at Meta.[1] Some of the twitter hits was the news letters from Wikimedia Norway. In my opinion this blacklisting has turned a minor problem into a major one. — Jeblad 10:25, 7 April 2019 (UTC)
@Jeblad: I am afraid you are underestimating the problem of the spambots. These are archives, it hardly hurts to disable the links by removing the http://. --Dirk Beetstra T C (en: U, T) 13:36, 7 April 2019 (UTC)
@Beetstra: Thank you for the information. I've been on Wikipedia since 2005, and running and writing bots for nearly as long. Blocking all kinds of entries just because you can is not a good solution. — Jeblad 14:47, 7 April 2019 (UTC)
I have (temporarily) whitelisted locally, and will review.  — billinghurst sDrewth 23:00, 7 April 2019 (UTC)
@Jeblad: having spambots go on a rampage is not a good idea either. I am sorry, but to protect wikipedia sometimes you have to block stuff. I have been pushing now for a more fine grained blacklist for probably close to 10 years .. one solution here is to temporarily whitelist, the other more permanent solution (since we are talking archives here) is to disable the links. —Dirk Beetstra T C (en: U, T) 03:44, 8 April 2019 (UTC)
Special:diff/19004956, what is this?--AldnonymousBicara? 07:01, 11 April 2019 (UTC)
Irrelevant. This is a regex that is more than the base domain, see the lead.  — billinghurst sDrewth 09:56, 11 April 2019 (UTC)

youtu.be



Hi,

Don't reject it outright :)

I see that removing was already proposed multiple times. It was rejected every time because it's a redirect site, and redirect sites are undesirable.

I'm going to try to propose it for removal again. Here's why: while I agree that redirect URLs are indeed not desirable, youtu dot be is not otherwise harmful. As far as I know, it can only be used to redirect to youtube dot com, which is mostly allowed. So the intention of the people who want to add a youtu dot be link is as good as the intention of people who want to add a youtube dot com link.

But when people try to add it using the mobile editor, the edit fails with the cryptic message "Error, edit not saved." Of course, the mobile editor could be improved to give a clearer message (see https://phabricator.wikimedia.org/T220922). But till then, can there be a better solution for youtu.be links?

For example, there could be a bot that automatically changes these links to youtube dot com links soon after such an edit happens. This would give users, who really just want to make a constructive edit, a much smoother experience.

Existing blacklisting of particular YouTube videos can be retained robustly by adding youtu.be to the line that blocks them.

Please give it a thought :) --Amir E. Aharoni (talk) 18:45, 14 April 2019 (UTC)

I believe the proper action would be to follow a link from a link shortner (a redirect domain) and use the final link. No link shortner should be blocked, only the final domain should be blocked.
I guess something like this must be implemented as part of the save operation. Perhaps an even better solution would be to implement a rewrite operation as part of AbuseFilter. — Jeblad 13:37, 20 April 2019 (UTC)
I believe the proper action would be to follow a link from a link shortner (a redirect domain) and use the final link - indeed. The problem is that the well-meaning editors who add youtu dot be links don't know that they need to do it manually. So it should be done automatically. If this can indeed be done in AbuseFilter so that it would not block anything and just do the conversion transparently, then it sounds like a good solution. --Amir E. Aharoni (talk) 19:33, 22 April 2019 (UTC)
The problem is, that there is quite some no-no material on youtube.com that is never to be linked to (plain copyright violations, plain spam). We have youtube.com rules on many blacklists (many Wikis have youtube.com parts or individual movies blacklisted, there are a couple here. Youtube.com, due to its rather strong depreciation, is on XLinkBot on en.wikipedia). When redirect sites to such pages are open, most of those rules have to be doubled (and youtu.be is not the only redirect). It makes the administration rather difficult (on en.wikipedia there are only a handful of editors active at the blacklist/whitelist, here it is also a selected few, on smaller wikis there are even less, and they then need to know that they have to block all redirects to certain material).
I can agree that the Wikimedia software should follow the links and react on the target, but that opens a loophole that is rather obvious: create a redirect site where you can change the target at wish. On that site, create a redirect that you make point to something 'good'. Add those links throughout to Wikipedia (no-one will complain), and then a bit after, change the redirect target to your spam page.
Moreover, except for the few 'dedicated' redirect sites (like youtu.be), most redirect sites obscure what you are actually linking to ('tinyurl.com/abcde' can link everywhere, you HAVE to follow the link to see where you get). It can be a good site, or it can be a bad site that was not blacklisted yet. But the only way to know is that you actually follow the link to the malware site that hacks your computer.
Then, this code is implemented in my linkwatcher, COIBot and XLinkBot. It is generally possible to 'resolve' the real target, but not all sites allow that. Tinyurl.com is a 'real' redirect site. Much (not all) of '.co.cc' is using a frameset - the .co.cc is a real page that loads its content from another website. Other sites are real redirects, but any results a header-request will give you do not show anything that tells you it is a redirect. All that you cannot automatically detect will go through anyway (and a large spam case of last week showed quite a number of undetectable sites, you can only filter on the actual content you get from the website, the metadata does not show anything).
Then finally, there is, really, no reason to link through a redirect site. You can always link to the actual target. I know that for some sites it is inconvenient (typically, youtube.com hands you the shortened link for sharing), but hey, most of the material on youtube is unencyclopedic (we are not interested in linking to private birthday parties, teething of kids, walking the dog, family weddings, renditions of 'Let it go' on a local talent show), quite some of the more interesting material is copyvio, and the official channel of almost all artists is superfluous as it is already linked from their official website. --Dirk Beetstra T C (en: U, T) 08:36, 23 April 2019 (UTC)
I'm not suggesting any redirect site. I'm only suggesting youtu dot be. As far as I know, it is not used for redirecting to any site, but only to YouTube. If it can be used for other sites, as it is with bitly and tinyurl, then it shouldn't be added.
I'm not suggesting that the youtu dot be links be preserved just like that. I suggest that they be automatically changed to youtube dot com, either by bot soon after publishing a revision or, if possible, immediately during the publishing.
Finally, while it's true that there is no reason to link through a redirect, the problem is that you understand this, but a lot of users don't. For a lot of users, a youtu dot be URL is as valid as youtube dot com URL, and the software doesn't help editors understand it that they shoudln't use youtu dot be. --Amir E. Aharoni (talk) 18:40, 23 April 2019 (UTC)
@Beetstra: youtu.be is the official YouTube way of sharing their videos links. If youtube.com is not in this blacklist, youtu.be should not be either. Whether YouTube should be blacklisted altogether is off-topic in this discussion. —Jerome Charles Potts (talk) 13:47, 4 May 2019 (UTC)
There are many youtube links blacklisted on many wikis. The administration of all redirects to the same video is beyond many editors. They are often discouraged (and sometimes should not be linked to at all). Your argument, rewritten, is that because a tinyurl shortened link can point to en.wikipedia pages, it should not be blacklisted. URL shorteners are abused, and never needed (just convenient). (And youtu.be links ARE spammed, unlike youtube.com). —Dirk Beetstra T C (en: U, T) 13:54, 4 May 2019 (UTC)
In truth both youtube.com and youtu.be links are spammed, however, I tell you that I am only wanting to be writing filters for just the one base domain name, and not all the variations that occur. Noting that somedays I would just love to blacklist youtube.com as well and make users use an interwiki mapped link!!! I do think that there needs to be a middle way, and not either or every, so please can we find the balance.  — billinghurst sDrewth 02:56, 5 May 2019 (UTC)
In the last 500 blacklist hit entries on en.wikipedia ([2]) there are 190 hits on youtube.be (some multiple per line). en:User:AnomieBOT_III/Spambot_URI_list contains numerous youtu.be (and other redirects) to show samples of redirect sites being the target for spamming. Redirect sites result in many direct and indirect problems, and there is, really, NO need for them, they are, by definition, replaceable. I am in for other solutions, but removing redirects 'for convenience' is not a solution. --Dirk Beetstra T C (en: U, T) 07:04, 5 May 2019 (UTC)
Please don't give tinyurl as an example. It's irrelevant and misleading. tinyurl is a particular generic redirect site. I'm not talking about a generic redirect site. I'm only talking about youtu.be, which is only a variant youtube.com.
Having youtu.be has many false positives. This proposal is trying to eliminate them.
Yes, they are replaceable, but users don't know that they are replaceable. They can be replaced by a bot rather than blindly rejected.
Yes, particular videos should be banned, and already are, but this can be achieved robustly using one regex, as I have already suggested in the first post in this thread. --Amir E. Aharoni (talk) 18:56, 5 May 2019 (UTC)
Yes, it can be done in one robust regex, but that needs all of the wikis to be fully aware of how to write such rules. Moreover, we would have to blacklist all youtu.be's that do get spammed directly (funnily enough, there is more youtu.be that gets spammed than youtube.com) as well. And that all on a site where use is limited (most material does not warrant to be linked, quite some material should never be linked), and then using a easy to avoid way of linking. I see your point, but I am not convinced whether the advantages outweigh the disadvantages). --Dirk Beetstra T C (en: U, T) 09:02, 27 May 2019 (UTC)
Perhaps User:InternetArchiveBot can be used to automatically rewrite these links after they are posted? --Amir E. Aharoni (talk) 08:19, 3 May 2019 (UTC)
That would the InternetArchiveBot run into the blacklist .. --Dirk Beetstra T C (en: U, T) 09:02, 27 May 2019 (UTC)

bitcointalk.org



This was added https://meta.wikimedia.org/wiki/Talk:Spam_blacklist/Archives/2016-02#bitcointalk.org in 2016]. The addition focused on links being created for the formal announcements of new crypto-currencies. Few to none today announce primarily via Bitcointalk, and many of the things that were announced that way now have wikipedia articles about them (for better or worse). Bitcointalk remains one of the premier locations for technical discussion, and the only location for many important historical discussions. The site is very actively moderated and not a source for outright spam. Although it isn't an appropriate citation for every case, this would probably be best addressed on a wiki by wiki basis based on the particular linking requirements of the wiki(s) in question. --Gmaxwell (talk) 23:36, 30 April 2019 (UTC)

@Gmaxwell: It was being spambot'd back in the day. Is it worth applying for a whitelisting at enWP and measuring the success of the use of the link as an initial response? en:Mediawiki talk:Spam-whitelist  — billinghurst sDrewth 02:02, 1 May 2019 (UTC)

Overly aggressive blocking

Checked a few of the newer entries, and a whole bunch of them are added without ever being used for spam. Some of them are used one or two times, but most of them is not used ever. Most of them is not on known spamlists. It seems like some (many?) of them are sharing IP addresses, but given how some webhotels operates this is a very weak (I would say invalid) indication, as HTTP 1.1 allows IP sharing. I guess a whole lot of the new entries are based on the same flawed assumption. — Jeblad 15:13, 7 April 2019 (UTC)

@Jeblad: Not sure to which entries you are referring, so you will need to ask specific questions of the person who added the entries. I would suggest that numbers of entries of domains will be found in the abuse filter logs due to spambot activity, and usually in proliferation, so there can be a case of early intervention of known problems and persistence.

Also at this stage COIBot is patchy since it was moved when WMFlabs lost one its instances and the bot account was moved nd liited, so beware trusting all of its reports, it has elements of quirkiness and gaps.  — billinghurst sDrewth 10:47, 10 April 2019 (UTC)

@Jeblad: Can you show me a couple of the domains you are talking about? But note that if we have one IP that adds in 10 edits 10 different domains, then that is still a spam-account. Yes, it may be 10 different people sharing the IP, but if that is all that that IP is doing then that is a very small chance that it is a coincidence that that is not an address used by spammers. --Dirk Beetstra T C (en: U, T) 08:24, 14 April 2019 (UTC)

the name viagrandparis contains viagra so it is detected as a spam



Hi. My article for Jean-Claude Durousseaud is refused beacause of a spam detected in the name of the french TV channel : viàGrandParis as there are v.i.a.g.r.a letters. Can you help me with that, please?--Léa Rateau (talk) 14:38, 11 April 2019 (UTC)Léa Rateau

@Léa Rateau: I would consider to locally whitelist this domain. Although it is possible to exclude parts through writing complex regexes, I would not do that for domains which are maybe used on 2-3 wikis (out of the hundreds) as rules then quickly become too complex. (yes, a global whitelist would be good, as that is easier to administrate, but unfortunately that gap has never been filled by WMF). --Dirk Beetstra T C (en: U, T) 08:30, 14 April 2019 (UTC)

Wikimedia URL shortener



This is a 'redirection service' in principle, should we block it? — regards, Revi 14:21, 13 April 2019 (UTC)

I am semi in favour as we should be utilising proper and evident links as we specify for all others. Though we should get labelling on the new tool to identify that it cannot be utilised on WMF wikis. @Lea Lacroix (WMDE):  — billinghurst sDrewth 01:31, 14 April 2019 (UTC)
@Billinghurst, -revi, and Lea Lacroix (WMDE): I don't assume that this extension is so smart that it is capable to check the target versus our spam-blacklist, right? --Dirk Beetstra T C (en: U, T) 08:33, 14 April 2019 (UTC)
I will note that this issue of WMF blocking external url shorteners/redirects was brought up on the initial phabricator ticket, so it is not an unexpected for this discussion to occur.  — billinghurst sDrewth 08:39, 14 April 2019 (UTC)
@Beetstra: it is internal link only, so I am not sure that it needs to check blacklists, it is more whether we are wishing to encourage use of these links in preference to use of interwikis, and the like.  — billinghurst sDrewth 08:36, 14 April 2019 (UTC)
@Billinghurst: Oh, OK. Note that we do allow for some very specific redirect services to go unblacklisted, as assignment of those is purely restricted or assigned (dx.doi.org is one of them). If this is similar (cannot be abused on outside material, scheme is the same as <lang>.<wikiflavour>.org, then I do not see any reason to block this. For what I see from its current use, it seems to be quite clear where you are going. --Dirk Beetstra T C (en: U, T) 09:56, 14 April 2019 (UTC)
Same as other shortners, they should be unwind before contribution is saved. Only final domain should be blocked, not the url shortner itself. Blocking the shortner is simply the wrong solution. — Jeblad 13:46, 20 April 2019 (UTC)
@Jeblad: for other shorteners that would indeed be true, but a) not all shorteners can be followed automatically to allow for software to recognise the target, and b) MediaWiki does not have those features anyway. URL shorteners are on a significant scale being used to specifically circumvent the blacklist and in some cases are spammed to avoid blacklisting of the actual target. Basically, barring very, very few exceptions, the use of an URL shortener service is never needed as there is, by definition, always the expanded url that you can use. Moreover, by far most of the url shorteners obfuscate where you are going and some allow to change target after they have been added (if you see cnn.com you know where you go, with tinyurl.com/asdfgdjkat may expect to go to cnn.com, but it may very well lead you to pornhub.com). The potential and observed abuse with url shorteners outweighs by far any use. --Dirk Beetstra T C (en: U, T) 12:45, 4 August 2019 (UTC)
Oppose, as this is under our control already and can be very useful for posting certain long url's with lots of parameters, etc to discussions. — xaosflux Talk 15:43, 21 May 2019 (UTC)

I am going to officially Declined Declined this. This service ONLY works when the target is a WikiMedia controlled site. This can not be used for any non-WikiMedia sites. Although I understand the concern that it can be used to obfuscate where you are going to, I think that for this shortening service the use will very much outweigh the abuse, and that any abuse of this site can hence more efficiently be controlled using other tools to stop the abuse (blocking, page protection, edit filters, or even including blacklisting of specific shortened links for short periods). --Dirk Beetstra T C (en: U, T) 12:45, 4 August 2019 (UTC)

Note that I would strongly favour an automated system (bot?) that converts these links in content namespaces only to the full link to make sure that they link where they should. --Dirk Beetstra T C (en: U, T) 12:48, 4 August 2019 (UTC)