Jump to content

Talk:Spam blacklist: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Latest comment: 5 years ago by Billinghurst in topic Proposed additions
Content deleted Content added
JarBot (talk | contribs)
m Bot: archiving 1 request to Talk:Spam blacklist/Archives/2019-06
→‎Proposed additions: 1bestmeds.blogspot.com
Line 14: Line 14:
:One thing that may be difficult in that regard is that you cannot use custom global messages. &nbsp;— [[user:billinghurst|billinghurst]] ''<span style="font-size:smaller">[[user talk:billinghurst|sDrewth]]</span>'' 14:00, 22 May 2019 (UTC)
:One thing that may be difficult in that regard is that you cannot use custom global messages. &nbsp;— [[user:billinghurst|billinghurst]] ''<span style="font-size:smaller">[[user talk:billinghurst|sDrewth]]</span>'' 14:00, 22 May 2019 (UTC)
::So then there is hardly any difference between blacklisting or abusefiltering. I do think it is better to block these links. --[[User:Beetstra|Dirk Beetstra]] <sup>[[User_Talk:Beetstra|<span style="color:#0000FF;">T</span>]] [[Special:Contributions/Beetstra|<span style="color:#0000FF;">C</span>]]</sup> (en: [[:en:User:Beetstra|U]], [[:en:User talk:Beetstra|T]]) 06:31, 23 May 2019 (UTC)
::So then there is hardly any difference between blacklisting or abusefiltering. I do think it is better to block these links. --[[User:Beetstra|Dirk Beetstra]] <sup>[[User_Talk:Beetstra|<span style="color:#0000FF;">T</span>]] [[Special:Contributions/Beetstra|<span style="color:#0000FF;">C</span>]]</sup> (en: [[:en:User:Beetstra|U]], [[:en:User talk:Beetstra|T]]) 06:31, 23 May 2019 (UTC)

===1bestmeds.blogspot.com===
{{linksummary|1bestmeds.blogspot.com}}
very active spambot &nbsp;— [[user:billinghurst|billinghurst]] ''<span style="font-size:smaller">[[user talk:billinghurst|sDrewth]]</span>'' 23:42, 13 June 2019 (UTC)


== Proposed additions (Bot reported) ==
== Proposed additions (Bot reported) ==

Revision as of 23:42, 13 June 2019

Shortcut:
WM:SPAM
WM:SBL
The associated page is used by the MediaWiki Spam Blacklist extension, and lists regular expressions which cannot be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any Meta administrator can edit the spam blacklist; either manually or with SBHandler. For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.

Proposed additions
Please provide evidence of spamming on several wikis. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Please check back after submitting your report, there could be questions regarding your request.
Proposed removals
Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.
Other discussion
Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
#wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.
Whitelists
There is no global whitelist, so if you are seeking a whitelisting of a url at a wiki then please address such matters via use of the respective Mediawiki talk:Spam-whitelist page at that wiki, and you should consider the use of the template {{edit protected}} or its local equivalent to get attention to your edit.

Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.


Completed requests are marked as {{added}}/{{removed}} or {{declined}}, and are generally archived quickly. Additions and removals are logged · current log 2024/07.

Proposed additions

This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived.

0.freebasics.com



For me all these links added seem to just flow back to one central page of https://connectivity.fb.com/. I am not sure whether there are different connections for different places. The enWP article w:Internet.org has a little about the service, and I think that it relates to having a mobile app, and clicking the link basically acts as a proxy, or the url as a faux protocol for an app. While it may be a useful for someone on the service, it seems useless for anyone outside of the service. The closest to which we can probably equate is how libraries use some of the proxy websites that only work for those inside the service. I suggest that we stick it up on blacklist talk, though maybe there is someone more aware of the system and some solutions.  — billinghurst sDrewth 22:59, 19 May 2019 (UTC)Reply

@Billinghurst: on en.wikipedia the university library proxies are hard blocked through an edit filter, with a warning stating that they have to convert the proxy link to a regular link and that they will not be able to save the link (en:Special:AbuseFilter/892, throwing warning en:MediaWiki:abusefilter-warning-proxy-link; e.g. a link like 'search.proquest.com.ezproxy.torontopubliclibrary.ca/docview/1418755534?accountid=14369' cannot be decoded in any form, and you get redirected to a login page with no information as to the original source). One could consider to convert that filter to global and add other, similar proxies. --Dirk Beetstra T C (en: U, T) 12:19, 22 May 2019 (UTC)Reply

One thing that may be difficult in that regard is that you cannot use custom global messages.  — billinghurst sDrewth 14:00, 22 May 2019 (UTC)Reply
So then there is hardly any difference between blacklisting or abusefiltering. I do think it is better to block these links. --Dirk Beetstra T C (en: U, T) 06:31, 23 May 2019 (UTC)Reply

1bestmeds.blogspot.com



very active spambot  — billinghurst sDrewth 23:42, 13 June 2019 (UTC)Reply

Proposed additions (Bot reported)

This section is for domains which have been added to multiple wikis as observed by a bot.

These are automated reports, please check the records and the link thoroughly, it may report good links! For some more info, see Spam blacklist/Help#COIBot_reports. Reports will automatically be archived by the bot when they get stale (less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot).

Sysops
  • If the report contains links to less than 5 wikis, then only add it when it is really spam
  • Otherwise just revert the link-additions, and close the report; closed reports will be reopened when spamming continues
  • To close a report, change the LinkStatus template to closed ({{LinkStatus|closed}})
  • Please place any notes in the discussion section below the HTML comment

COIBot

The LinkWatchers report domains meeting the following criteria:

  • When a user mainly adds this link, and the link has not been used too much, and this user adds the link to more than 2 wikis
  • When a user mainly adds links on one server, and links on the server have not been used too much, and this user adds the links to more than 2 wikis
  • If ALL links are added by IPs, and the link is added to more than 1 wiki
  • If a small range of IPs have a preference for this link (but it may also have been added by other users), and the link is added to more than 1 wiki.
COIBot's currently open XWiki reports
List Last update By Site IP R Last user Last link addition User Link User - Link User - Link - Wikis Link - Wikis
vrsystems.ru 2023-06-27 15:51:16 COIBot 195.24.68.17 192.36.57.94
193.46.56.178
194.71.126.227
93.99.104.93
2070-01-01 05:00:00 4 4

Proposed removals

This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.

Remember to provide the specific domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as {{removed}} or {{declined}} and archived.

See also recurring requests for repeatedly propo sed (and refused) removals.

Notes:

  • The addition or removal of a domain from the blacklist is not a vote; please do not bold the first words in statements.
  • This page is for the removal of domains from the global blacklist, not for removal of domains from the blacklists of individual wikis. For those requests please take your discussion to the pertinent wiki, where such requests would be made at Mediawiki talk:Spam-blacklist at that wiki. Search spamlists — remember to enter any relevant language code

genetherapynet.com



I tried adding this and found it was blacklisted here. It seems that someone with a COI to that site and others was spamming it on articles cross wiki. As far as I can tell the editor has not been here for quite a long time (see here, this and Talk:Spam blacklist/Archives/2011-02). Was wondering if it could be removed. Not sure about the other websites that were blacklisted at the same time as I have not explored them. Aircorn (talk) 06:28, 19 July 2018 (UTC)Reply

@Aircorn: it was being spammed in 2011. A removal needs more than "I want to add it", it usually needs firm reasoning about why it is usable at the sites. You can always ask about whitelisting at w:en:mediawiki talk:spam-whitelist  — billinghurst sDrewth 09:32, 9 August 2018 (UTC)Reply
It is quite hard to find sites that explain genetic engineering and are user friendly. It would make my editing life a little bit easier if I could use it as a reference, but it is not a site I would regularly use as there are much better (although less accessable) resources out there. I figured that since it was (as far as I could tell) only blacklisted due to spamming a long time ago then it would be relatively easy to unblacklist once that was no longer an issue. I will look at whitelisting if I really need it. Thanks for the response. Aircorn (talk) 09:44, 9 August 2018 (UTC)Reply
@Aircorn and Billinghurst: This is a bit curious, they were blacklisted per Talk:Spam_blacklist/Archives/2011-02#3_Adsense_related and apparently delisted per Talk:Spam_blacklist/Archives/2011-02#genetherapynet.com_etc. That suggests that our good faith de-listing did not work, and that they got re-listed afterwards. Not sure what to suggest here (sites from such organisations would qualify for a second chance, but this already seems to have happened). --Dirk Beetstra T C (en: U, T) 10:25, 10 February 2019 (UTC)Reply
@Beetstra: It was removed? The history tool indicates that it has been there consistently since 2011, so if it was out, it was for less than 500 revisions.  — billinghurst sDrewth 10:38, 10 February 2019 (UTC)Reply

more than 3000 entries

related: Talk:Spam_blacklist/Archives/2015-01#Now_restarting_the_original_debate

hi! (ping billinghurst, Beetstra) At user:lustiger seth/sbl log stats/all wikis no hits I started again to make a list of all sbl entries that have 0 hits in all ~900 wikis since they were added to list (but not earlier than 2013, when the sbl log came into existence). The script takes some time (another week probably). Half of the sbl entries (~4800) are checked already. more than 3000 never have been the reason for a blocked edit.
What do you think? Shall we delete those entries (except from the url shorteners) from the list? Advantage: the lesser entries, the clearer the structure. -- seth (talk) 10:28, 10 November 2018 (UTC)Reply

@Lustiger seth: Thank you for this effort. I think most of the non-shortener domains can be removed.
Re 'clearer [the] structure': Would it be possible to 'sort' the list at some point, stuff all the shorteners until now into a 'section', with at the end a mark from 'below here unsorted'. In that case, every year or so we can sort the unsorted into the above 'groups', and it would make clean-up of non-shorteners easier (you can even take them out before your parsing, no need to check whether they were abused or not, we keep 'm anyway if they are still a shortening service). --Dirk Beetstra T C (en: U, T) 07:36, 11 November 2018 (UTC)Reply
@Beetstra: I agree that merging/grouping the shorteners would be reasonable. -- seth (talk) 08:06, 11 November 2018 (UTC)Reply
  • Comment Comment I am interpreting that as you have been running through all wikis Special:log/spamblacklist. On checking some randoms, I see that some listings have come from a Poke, eg. User:COIBot/LinkReports/onlinetripadvisorturkey.com so there has been concerns that have caused addition. Have we done a cross reference against generated XWiki reports as numbers of urls come about from AbuseFilter hits, so if we have additions to blacklist, and generated XWiki reports, I am not certain that we want those removed. Also if we have a regex in place, I am loath to remove those as they have been specifically added from an evidence-base.  — billinghurst sDrewth 22:08, 11 November 2018 (UTC)Reply
    I am also seeing numbers of essay writing domains in the list, and while they have not been spammed, I am not certain that I want them removed. Call me paranoid, or call me nasty! If we are to remove urls, maybe we want to eyeball cull-proposed removals and remove those we would like to keep.  — billinghurst sDrewth 03:47, 12 November 2018 (UTC)Reply
    @Billinghurst: We could easily cut the regexes which have been added in the last # years (2?) from that list. If seth would re-run the script in a year (e.g.) then those with still no hits would come.
    Alternatively, we run a script on those filters, extract all the domains in those ... (heck, I could teach LiWa to read certain filters as prescribed in the settings and extract domains from that ... but that would be a feature that at the earliest I could write next summer; moreover I would love to have LiWa to have access to all special:log/spamblacklist, so I could record attempted additions there as well - attempts to spam would be a welcome addition to the coibot reports ...). --Dirk Beetstra T C (en: U, T) 05:17, 12 November 2018 (UTC)Reply
    (barging in) Maybe you are interested in checking against potential positives of my lists before removing? All of their entries are not collected automatically but are handpicked: spam links on Commons, spamming wikis. Best, --Achim (talk) 14:13, 12 November 2018 (UTC)Reply
    @Achim55: Your list can be loaded into COIBot; if you use IRC, we can give you permissions with COIBot to add these to be monitored per Small Wiki Monitoring Team/IRC or if not, we can give you permission so you can add them to User:COIBot/Poke. @Beetstra: are you thinking of converting to json, or something similar? If not, then I am going to need to get js/css permissions :-/  — billinghurst sDrewth 03:37, 16 November 2018 (UTC)Reply
    @Billinghurst and Achim55: I cannot just convert to json, it is currently not valid json. I will have to go to regular pages and e.g. get template-editor access for COIBot. But that is besides the point. We can also poke that list, and I will give access to achim to poke as well. —Dirk Beetstra T C (en: U, T) 04:24, 16 November 2018 (UTC)Reply
  • Comment Comment I would think that there would be value in at least keeping the old list of removed domains somewhere and having COIBot use that list at least for "monitor", or proactively pushing those in to be monitored.  — billinghurst sDrewth 03:39, 16 November 2018 (UTC)Reply

reading the blacklists ..

@Lustiger seth: related to your work here .. how do you manage to read ALL spam-blacklist-logs? I thought they were admin only .. ?? If they can be (bot-)read that would be very welcome, I could then build in the capability into LiWa3/COIBot, so the attempts to circumvent the blacklist can be shown in the reports which is very welcome evidence in case of de-blacklisting-requests, as well as for sock-hunting and finding spammers implementing workarounds (spammers attempt one blacklisted domain, and other that are not yet blacklisted .. that is a one-strike-and-you-are-out situation suitable for immediate blacklisting/blocks of the other domains). --Dirk Beetstra T C (en: U, T) 05:26, 14 November 2018 (UTC)Reply

@Beetstra: they used to be, apparently the developers decided they should be open to all logged in users in phab:T64781. — xaosflux Talk 15:49, 14 November 2018 (UTC)Reply
@Xaosflux: I I never understood the initial choice... thanks, I will need to code this into my bots! Thanks! —Dirk Beetstra T C (en: U, T) 17:04, 14 November 2018 (UTC)Reply
Hi!
another related ticket phab:T184483.
I use the replica db at toolserver. -- seth (talk) 21:28, 14 November 2018 (UTC)Reply

visitsubotica.rs



It is blacklisted before 10 years, and I think to this should be removed from global blacklist, as it is The Official Tourism Website of Subotica. Subotica is city in Serbia. --Zoranzoki21 (talk) 22:32, 17 March 2019 (UTC)Reply

Yes, and that was recognised in the XWiki report prior to it being added as it was being spammed. I would suggest that you should seek whitelisting at the wiki of interest, as an initial measure. I feel that this request would need a level of support to be removed when considering its history.  — billinghurst sDrewth 02:43, 18 March 2019 (UTC)Reply

twitter.com



The blacklisted pattern \btwitter\.com/search\b blocks cleanup of archived pages as it is used on a lot of mass messages. Over two years I had about 500 hits. I believe the best would be to remove this entry from the blacklist. — Jeblad 14:28, 6 April 2019 (UTC)Reply

@Jeblad: how can this be used in mass-messages if it is blacklisted? Are the very old messages? --Dirk Beetstra T C (en: U, T) 07:11, 7 April 2019 (UTC)Reply
Can you whitelist it locally? Even temporarily for an archiving run. It was being horribly abused by the spambots when we added it.  — billinghurst sDrewth 07:15, 7 April 2019 (UTC)Reply
This was archive runs at Meta.[1] Some of the twitter hits was the news letters from Wikimedia Norway. In my opinion this blacklisting has turned a minor problem into a major one. — Jeblad 10:25, 7 April 2019 (UTC)Reply
@Jeblad: I am afraid you are underestimating the problem of the spambots. These are archives, it hardly hurts to disable the links by removing the http://. --Dirk Beetstra T C (en: U, T) 13:36, 7 April 2019 (UTC)Reply
@Beetstra: Thank you for the information. I've been on Wikipedia since 2005, and running and writing bots for nearly as long. Blocking all kinds of entries just because you can is not a good solution. — Jeblad 14:47, 7 April 2019 (UTC)Reply
I have (temporarily) whitelisted locally, and will review.  — billinghurst sDrewth 23:00, 7 April 2019 (UTC)Reply
@Jeblad: having spambots go on a rampage is not a good idea either. I am sorry, but to protect wikipedia sometimes you have to block stuff. I have been pushing now for a more fine grained blacklist for probably close to 10 years .. one solution here is to temporarily whitelist, the other more permanent solution (since we are talking archives here) is to disable the links. —Dirk Beetstra T C (en: U, T) 03:44, 8 April 2019 (UTC)Reply
Special:diff/19004956, what is this?--AldnonymousBicara? 07:01, 11 April 2019 (UTC)Reply
Irrelevant. This is a regex that is more than the base domain, see the lead.  — billinghurst sDrewth 09:56, 11 April 2019 (UTC)Reply

youtu.be



Hi,

Don't reject it outright :)

I see that removing was already proposed multiple times. It was rejected every time because it's a redirect site, and redirect sites are undesirable.

I'm going to try to propose it for removal again. Here's why: while I agree that redirect URLs are indeed not desirable, youtu dot be is not otherwise harmful. As far as I know, it can only be used to redirect to youtube dot com, which is mostly allowed. So the intention of the people who want to add a youtu dot be link is as good as the intention of people who want to add a youtube dot com link.

But when people try to add it using the mobile editor, the edit fails with the cryptic message "Error, edit not saved." Of course, the mobile editor could be improved to give a clearer message (see https://phabricator.wikimedia.org/T220922). But till then, can there be a better solution for youtu.be links?

For example, there could be a bot that automatically changes these links to youtube dot com links soon after such an edit happens. This would give users, who really just want to make a constructive edit, a much smoother experience.

Existing blacklisting of particular YouTube videos can be retained robustly by adding youtu.be to the line that blocks them.

Please give it a thought :) --Amir E. Aharoni (talk) 18:45, 14 April 2019 (UTC)Reply

I believe the proper action would be to follow a link from a link shortner (a redirect domain) and use the final link. No link shortner should be blocked, only the final domain should be blocked.
I guess something like this must be implemented as part of the save operation. Perhaps an even better solution would be to implement a rewrite operation as part of AbuseFilter. — Jeblad 13:37, 20 April 2019 (UTC)Reply
I believe the proper action would be to follow a link from a link shortner (a redirect domain) and use the final link - indeed. The problem is that the well-meaning editors who add youtu dot be links don't know that they need to do it manually. So it should be done automatically. If this can indeed be done in AbuseFilter so that it would not block anything and just do the conversion transparently, then it sounds like a good solution. --Amir E. Aharoni (talk) 19:33, 22 April 2019 (UTC)Reply
The problem is, that there is quite some no-no material on youtube.com that is never to be linked to (plain copyright violations, plain spam). We have youtube.com rules on many blacklists (many Wikis have youtube.com parts or individual movies blacklisted, there are a couple here. Youtube.com, due to its rather strong depreciation, is on XLinkBot on en.wikipedia). When redirect sites to such pages are open, most of those rules have to be doubled (and youtu.be is not the only redirect). It makes the administration rather difficult (on en.wikipedia there are only a handful of editors active at the blacklist/whitelist, here it is also a selected few, on smaller wikis there are even less, and they then need to know that they have to block all redirects to certain material).
I can agree that the Wikimedia software should follow the links and react on the target, but that opens a loophole that is rather obvious: create a redirect site where you can change the target at wish. On that site, create a redirect that you make point to something 'good'. Add those links throughout to Wikipedia (no-one will complain), and then a bit after, change the redirect target to your spam page.
Moreover, except for the few 'dedicated' redirect sites (like youtu.be), most redirect sites obscure what you are actually linking to ('tinyurl.com/abcde' can link everywhere, you HAVE to follow the link to see where you get). It can be a good site, or it can be a bad site that was not blacklisted yet. But the only way to know is that you actually follow the link to the malware site that hacks your computer.
Then, this code is implemented in my linkwatcher, COIBot and XLinkBot. It is generally possible to 'resolve' the real target, but not all sites allow that. Tinyurl.com is a 'real' redirect site. Much (not all) of '.co.cc' is using a frameset - the .co.cc is a real page that loads its content from another website. Other sites are real redirects, but any results a header-request will give you do not show anything that tells you it is a redirect. All that you cannot automatically detect will go through anyway (and a large spam case of last week showed quite a number of undetectable sites, you can only filter on the actual content you get from the website, the metadata does not show anything).
Then finally, there is, really, no reason to link through a redirect site. You can always link to the actual target. I know that for some sites it is inconvenient (typically, youtube.com hands you the shortened link for sharing), but hey, most of the material on youtube is unencyclopedic (we are not interested in linking to private birthday parties, teething of kids, walking the dog, family weddings, renditions of 'Let it go' on a local talent show), quite some of the more interesting material is copyvio, and the official channel of almost all artists is superfluous as it is already linked from their official website. --Dirk Beetstra T C (en: U, T) 08:36, 23 April 2019 (UTC)Reply
I'm not suggesting any redirect site. I'm only suggesting youtu dot be. As far as I know, it is not used for redirecting to any site, but only to YouTube. If it can be used for other sites, as it is with bitly and tinyurl, then it shouldn't be added.
I'm not suggesting that the youtu dot be links be preserved just like that. I suggest that they be automatically changed to youtube dot com, either by bot soon after publishing a revision or, if possible, immediately during the publishing.
Finally, while it's true that there is no reason to link through a redirect, the problem is that you understand this, but a lot of users don't. For a lot of users, a youtu dot be URL is as valid as youtube dot com URL, and the software doesn't help editors understand it that they shoudln't use youtu dot be. --Amir E. Aharoni (talk) 18:40, 23 April 2019 (UTC)Reply
@Beetstra: youtu.be is the official YouTube way of sharing their videos links. If youtube.com is not in this blacklist, youtu.be should not be either. Whether YouTube should be blacklisted altogether is off-topic in this discussion. —Jerome Charles Potts (talk) 13:47, 4 May 2019 (UTC)Reply
There are many youtube links blacklisted on many wikis. The administration of all redirects to the same video is beyond many editors. They are often discouraged (and sometimes should not be linked to at all). Your argument, rewritten, is that because a tinyurl shortened link can point to en.wikipedia pages, it should not be blacklisted. URL shorteners are abused, and never needed (just convenient). (And youtu.be links ARE spammed, unlike youtube.com). —Dirk Beetstra T C (en: U, T) 13:54, 4 May 2019 (UTC)Reply
In truth both youtube.com and youtu.be links are spammed, however, I tell you that I am only wanting to be writing filters for just the one base domain name, and not all the variations that occur. Noting that somedays I would just love to blacklist youtube.com as well and make users use an interwiki mapped link!!! I do think that there needs to be a middle way, and not either or every, so please can we find the balance.  — billinghurst sDrewth 02:56, 5 May 2019 (UTC)Reply
In the last 500 blacklist hit entries on en.wikipedia ([2]) there are 190 hits on youtube.be (some multiple per line). en:User:AnomieBOT_III/Spambot_URI_list contains numerous youtu.be (and other redirects) to show samples of redirect sites being the target for spamming. Redirect sites result in many direct and indirect problems, and there is, really, NO need for them, they are, by definition, replaceable. I am in for other solutions, but removing redirects 'for convenience' is not a solution. --Dirk Beetstra T C (en: U, T) 07:04, 5 May 2019 (UTC)Reply
Please don't give tinyurl as an example. It's irrelevant and misleading. tinyurl is a particular generic redirect site. I'm not talking about a generic redirect site. I'm only talking about youtu.be, which is only a variant youtube.com.
Having youtu.be has many false positives. This proposal is trying to eliminate them.
Yes, they are replaceable, but users don't know that they are replaceable. They can be replaced by a bot rather than blindly rejected.
Yes, particular videos should be banned, and already are, but this can be achieved robustly using one regex, as I have already suggested in the first post in this thread. --Amir E. Aharoni (talk) 18:56, 5 May 2019 (UTC)Reply
Yes, it can be done in one robust regex, but that needs all of the wikis to be fully aware of how to write such rules. Moreover, we would have to blacklist all youtu.be's that do get spammed directly (funnily enough, there is more youtu.be that gets spammed than youtube.com) as well. And that all on a site where use is limited (most material does not warrant to be linked, quite some material should never be linked), and then using a easy to avoid way of linking. I see your point, but I am not convinced whether the advantages outweigh the disadvantages). --Dirk Beetstra T C (en: U, T) 09:02, 27 May 2019 (UTC)Reply
Perhaps User:InternetArchiveBot can be used to automatically rewrite these links after they are posted? --Amir E. Aharoni (talk) 08:19, 3 May 2019 (UTC)Reply
That would the InternetArchiveBot run into the blacklist .. --Dirk Beetstra T C (en: U, T) 09:02, 27 May 2019 (UTC)Reply

bitcointalk.org



This was added https://meta.wikimedia.org/wiki/Talk:Spam_blacklist/Archives/2016-02#bitcointalk.org in 2016]. The addition focused on links being created for the formal announcements of new crypto-currencies. Few to none today announce primarily via Bitcointalk, and many of the things that were announced that way now have wikipedia articles about them (for better or worse). Bitcointalk remains one of the premier locations for technical discussion, and the only location for many important historical discussions. The site is very actively moderated and not a source for outright spam. Although it isn't an appropriate citation for every case, this would probably be best addressed on a wiki by wiki basis based on the particular linking requirements of the wiki(s) in question. --Gmaxwell (talk) 23:36, 30 April 2019 (UTC)Reply

@Gmaxwell: It was being spambot'd back in the day. Is it worth applying for a whitelisting at enWP and measuring the success of the use of the link as an initial response? en:Mediawiki talk:Spam-whitelist  — billinghurst sDrewth 02:02, 1 May 2019 (UTC)Reply

Troubleshooting and problems

This section is for comments related to problems with the blacklist (such as incorrect syntax or entries not being blocked), or problems saving a page because of a blacklisted link. This is not the section to request that an entry be unlisted (see Proposed removals above).

Discussion

This section is for discussion of Spam blacklist issues among other users.

Réclamation pour mon article

Je souhaite que lien posant problème soit retirer. Vu qu'il est vérifiable, vous donc aussi vérifier. Merci Bob jeci (talk) 10:53, 17 November 2018 (UTC)Reply

@Bob jeci: Hi, you are going to need to provide specific information. We cannot guess where you are editing, and the domain which you are wishing to add.  — billinghurst sDrewth 11:21, 17 November 2018 (UTC)Reply

Informations

Juste préciser que les seul informations vérifiable sur moi que je voulais ajouter je les copié à partir d'une recherche faite sur Google. Vous rentrez ( N'guessan Enoh Jean cinel) et vous aurez les liens que j'ai proposé. Merxi Bob jeci (talk) 08:15, 23 November 2018 (UTC)Reply

@Bob jeci: It is unclear what you wish to achieve. If you are saying that a search link for Google is blocked, yes, that is the purpose of the blacklisting, and won't be changed. I suggest that you discuss that matter at the wiki where you are looking to add your information.  — billinghurst sDrewth 09:45, 23 November 2018 (UTC)Reply

Smart template for deferring to local wikis?

Can we have a template, e.g. {{Deferlocal|w|en}} that results in a text 'Defer to English Wikipedia Spam blacklist' (but displaying the target in the local language etc.?) --Dirk Beetstra T C (en: U, T) 11:48, 21 November 2018 (UTC)Reply

@Beetstra: done. Defaults to enWP where parameterless. $1= sister, $2 = lang; $3 = override text link  — billinghurst sDrewth 09:56, 23 November 2018 (UTC)Reply
Comment Comment someone wish to turn it into a translated template. <shrug>  — billinghurst sDrewth 09:58, 23 November 2018 (UTC)Reply

COIBot and the spam blacklist log

COIBot is currently, in the 'free time' of the report saving module, backparsing the spam blacklist log, one wiki at a time. It turns out that one wiki is a humongous chunk of data, and that the bot spends quite some time before starting to parse reports again. Please be patient while this operation runs. The data is stored with the regular link additions, and the bots will then accessit in the same way as usual.

That likely results in certain parts of COIBot's reporting functions (on wiki and on IRC) to show strange results as some code may not understand how things are stored. I will resolve that later. --Dirk Beetstra T C (en: U, T) 17:53, 1 December 2018 (UTC)Reply

@Beetstra: Are there things that we should not do as they may hinder the process; or things that we should moderate/lessen in doing?  — billinghurst sDrewth 23:48, 1 December 2018 (UTC)Reply
Just be patient with it .. —Dirk Beetstra T C (en: U, T) 00:07, 2 December 2018 (UTC)Reply
@Beetstra: FYI: note that COIBot is writing to the wiki where quickcreate is requested, however, it is not recording its standard analysis from "report xwiki ..." They pass through in time, and are not written up at this point of time.  — billinghurst sDrewth 12:55, 16 December 2018 (UTC)Reply
@Billinghurst: I will have a look this evening. COIBot is running 2 LinkSavers, one parsing blacklists, the other one not. Unfortunately, that is prone to crashes. I presume that currently both are on a blacklist parsing the whole thing. I just hope that the one parsing en.wikipedia is done soon, but there are hellish months in the history of that (spambots hitting thousands of times an hour, back in 2015, see e.g. https://en.wikipedia.org/w/index.php?title=Special:Log/spamblacklist/91.200.12.79&action=edit&redlink=1). --Dirk Beetstra T C (en: U, T) 13:25, 16 December 2018 (UTC)Reply
@Billinghurst: bot was confused .. I restarted the LinkSaver that should be saving. It borked (nothing you can solve from IRC .. unfortunately). Just to illustrate, the blacklist parser spent the last 13 1/2 hours parsing the 2nd of May 2015 ... --Dirk Beetstra T C (en: U, T) 17:31, 16 December 2018 (UTC)Reply

Overly aggressive blocking

Checked a few of the newer entries, and a whole bunch of them are added without ever being used for spam. Some of them are used one or two times, but most of them is not used ever. Most of them is not on known spamlists. It seems like some (many?) of them are sharing IP addresses, but given how some webhotels operates this is a very weak (I would say invalid) indication, as HTTP 1.1 allows IP sharing. I guess a whole lot of the new entries are based on the same flawed assumption. — Jeblad 15:13, 7 April 2019 (UTC)Reply

@Jeblad: Not sure to which entries you are referring, so you will need to ask specific questions of the person who added the entries. I would suggest that numbers of entries of domains will be found in the abuse filter logs due to spambot activity, and usually in proliferation, so there can be a case of early intervention of known problems and persistence.

Also at this stage COIBot is patchy since it was moved when WMFlabs lost one its instances and the bot account was moved nd liited, so beware trusting all of its reports, it has elements of quirkiness and gaps.  — billinghurst sDrewth 10:47, 10 April 2019 (UTC)Reply

@Jeblad: Can you show me a couple of the domains you are talking about? But note that if we have one IP that adds in 10 edits 10 different domains, then that is still a spam-account. Yes, it may be 10 different people sharing the IP, but if that is all that that IP is doing then that is a very small chance that it is a coincidence that that is not an address used by spammers. --Dirk Beetstra T C (en: U, T) 08:24, 14 April 2019 (UTC)Reply

the name viagrandparis contains viagra so it is detected as a spam



Hi. My article for Jean-Claude Durousseaud is refused beacause of a spam detected in the name of the french TV channel : viàGrandParis as there are v.i.a.g.r.a letters. Can you help me with that, please?--Léa Rateau (talk) 14:38, 11 April 2019 (UTC)Léa RateauReply

@Léa Rateau: I would consider to locally whitelist this domain. Although it is possible to exclude parts through writing complex regexes, I would not do that for domains which are maybe used on 2-3 wikis (out of the hundreds) as rules then quickly become too complex. (yes, a global whitelist would be good, as that is easier to administrate, but unfortunately that gap has never been filled by WMF). --Dirk Beetstra T C (en: U, T) 08:30, 14 April 2019 (UTC)Reply

Wikimedia URL shortener



This is a 'redirection service' in principle, should we block it? — regards, Revi 14:21, 13 April 2019 (UTC)Reply

I am semi in favour as we should be utilising proper and evident links as we specify for all others. Though we should get labelling on the new tool to identify that it cannot be utilised on WMF wikis. @Lea Lacroix (WMDE):  — billinghurst sDrewth 01:31, 14 April 2019 (UTC)Reply
@Billinghurst, -revi, and Lea Lacroix (WMDE): I don't assume that this extension is so smart that it is capable to check the target versus our spam-blacklist, right? --Dirk Beetstra T C (en: U, T) 08:33, 14 April 2019 (UTC)Reply
I will note that this issue of WMF blocking external url shorteners/redirects was brought up on the initial phabricator ticket, so it is not an unexpected for this discussion to occur.  — billinghurst sDrewth 08:39, 14 April 2019 (UTC)Reply
@Beetstra: it is internal link only, so I am not sure that it needs to check blacklists, it is more whether we are wishing to encourage use of these links in preference to use of interwikis, and the like.  — billinghurst sDrewth 08:36, 14 April 2019 (UTC)Reply
@Billinghurst: Oh, OK. Note that we do allow for some very specific redirect services to go unblacklisted, as assignment of those is purely restricted or assigned (dx.doi.org is one of them). If this is similar (cannot be abused on outside material, scheme is the same as <lang>.<wikiflavour>.org, then I do not see any reason to block this. For what I see from its current use, it seems to be quite clear where you are going. --Dirk Beetstra T C (en: U, T) 09:56, 14 April 2019 (UTC)Reply
Same as other shortners, they should be unwind before contribution is saved. Only final domain should be blocked, not the url shortner itself. Blocking the shortner is simply the wrong solution. — Jeblad 13:46, 20 April 2019 (UTC)Reply
Oppose, as this is under our control already and can be very useful for posting certain long url's with lots of parameters, etc to discussions. — xaosflux Talk 15:43, 21 May 2019 (UTC)Reply