Talk:Spam blacklist

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Requests and proposals Spam blacklist Archives (current)→
The associated page is used by the MediaWiki Spam Blacklist extension, and lists regular expressions which cannot be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any meta administrator can edit the spam blacklist; either manually or with SBHandler. For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.
Proposed additions
Please provide evidence of spamming on several wikis and prior blacklisting on at least one. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Please check back after submitting your report, there could be questions regarding your request.
Proposed removals
Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.
Other discussion
Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
#wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.
Whitelists
There is no global whitelist, so if you are seeking a whitelisting of a url at a wiki then please address such matters via use of the respective Mediawiki talk:Spam-whitelist page at that wiki, and you should consider the use of the template {{edit protected}} or its local equivalent to get attention to your edit.

Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.


Completed requests are marked as {{added}}/{{removed}} or {{declined}}, and are generally archived quickly. Additions and removals are logged · current log 2018/12.

Translate this page
Projects

snippet for logging
{{sbl-log|18719829#{{subst:anchorencode:SectionNameHere}}}}

Proposed additions[edit]

Symbol comment vote.svg This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived.

Referral links to amazon.com[edit]





  • etc.

Hello. As far as I understand, referral links to Amazon are blocked now by \bamazon\.com.*(?:\?|&)tag=. The problem is that this record doesn't block links to national branches of Amazon (amazon.de, amazon.co.uk etc). I suppose national branches should be blocked globally rather than locally. Thanks. Track13 0_o 13:02, 19 November 2018 (UTC)

When I am next in IRC I will get COIBot to run some analysis though I think COIBot is going to say "too many links". Generally for significant changes we notify the major wikis, or at least those where the link is being used/allowed. Previously we have done that to the administrators' noticeboards pointing them here to express an opinion, and a quick look at the links already in place indicates that such a block will have consequences (not qualifying good or bad, just worthwhile flagging with communities).  — billinghurst sDrewth 21:46, 19 November 2018 (UTC)
I would be in favour of indeed expanding this to all possible tlds. If we block one, we should block all, and imho, even when added in good faith, the referral tag should be removed anywhere on Wikipedia, as I can' t envisage that it will be pursuant to our goals on any wiki. We could ping the larger communities, but that does not necessarily need to stop our efforts. Links that are there should be de-'tag='d, but since blacklisting does not disable editing a page that should not be an issue, and where it is an issue, it should just be an incentive to remove the 'tag='. --Dirk Beetstra T C (en: U, T) 12:43, 16 December 2018 (UTC)

RFD: outlets.co... regex for consideration[edit]

Link/text requested to be blacklisted: outlets\.com?\b

I am proposing the above regex as we are seeing a lot of spambots adding shopping spam ending in ...outlets.com and ...outlets.co.uk

This is being added here for consultation as it is undetermined how many good urls could be like this. My exploration hasn't found any, though the tools for such explorations are pretty ordinary, and I am no expert on legitimate marketing urls in use at the WPs. Community input is desired prior to proceeding to false positives / negative consequences.  — billinghurst sDrewth 23:19, 15 December 2018 (UTC)

@Billinghurst: That is going to be heavy on the db to run such a query. I could envisage good links here, but we'll have to exclude them by design here / whitelisting where needed. --Dirk Beetstra T C (en: U, T) 12:45, 16 December 2018 (UTC)
We can just continue to blacklist if it is problematic, they usually butt up against filters, though there is a small amount of leakage.  — billinghurst sDrewth 12:51, 16 December 2018 (UTC)

Proposed additions (Bot reported)[edit]

Symbol comment vote.svg This section is for domains which have been added to multiple wikis as observed by a bot.

These are automated reports, please check the records and the link thoroughly, it may report good links! For some more info, see Spam blacklist/Help#COIBot_reports. Reports will automatically be archived by the bot when they get stale (less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot).

Sysops
  • If the report contains links to less than 5 wikis, then only add it when it is really spam
  • Otherwise just revert the link-additions, and close the report; closed reports will be reopened when spamming continues
  • To close a report, change the LinkStatus template to closed ({{LinkStatus|closed}})
  • Please place any notes in the discussion section below the HTML comment

COIBot[edit]

The LinkWatchers report domains meeting the following criteria:

  • When a user mainly adds this link, and the link has not been used too much, and this user adds the link to more than 2 wikis
  • When a user mainly adds links on one server, and links on the server have not been used too much, and this user adds the links to more than 2 wikis
  • If ALL links are added by IPs, and the link is added to more than 1 wiki
  • If a small range of IPs have a preference for this link (but it may also have been added by other users), and the link is added to more than 1 wiki.
COIBot's currently open XWiki reports
List Last update By Site IP R Last user Last link addition User Link User - Link User - Link - Wikis Link - Wikis
commandcenter.blogspot.mx 2018-12-16 21:55:10 COIBot 172.217.15.97 R Cortex128
Sachin12345633
TAKAHASHI Shuuji
2018-12-16 21:41:25 25 9

Proposed removals[edit]

Symbol comment vote.svg This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.

Remember to provide the specific domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as {{removed}} or {{declined}} and archived.

See also recurring requests for repeatedly propo sed (and refused) removals.

Notes:

  • The addition or removal of a domain from the blacklist is not a vote; please do not bold the first words in statements.
  • This page is for the removal of domains from the global blacklist, not for removal of domains from the blacklists of individual wikis. For those requests please take your discussion to the pertinent wiki, where such requests would be made at Mediawiki talk:Spam-blacklist at that wiki. Search spamlists — remember to enter any relevant language code

genetherapynet.com[edit]



I tried adding this and found it was blacklisted here. It seems that someone with a COI to that site and others was spamming it on articles cross wiki. As far as I can tell the editor has not been here for quite a long time (see here, this and Talk:Spam blacklist/Archives/2011-02). Was wondering if it could be removed. Not sure about the other websites that were blacklisted at the same time as I have not explored them. Aircorn (talk) 06:28, 19 July 2018 (UTC)

@Aircorn: it was being spammed in 2011. A removal needs more than "I want to add it", it usually needs firm reasoning about why it is usable at the sites. You can always ask about whitelisting at w:en:mediawiki talk:spam-whitelist  — billinghurst sDrewth 09:32, 9 August 2018 (UTC)
It is quite hard to find sites that explain genetic engineering and are user friendly. It would make my editing life a little bit easier if I could use it as a reference, but it is not a site I would regularly use as there are much better (although less accessable) resources out there. I figured that since it was (as far as I could tell) only blacklisted due to spamming a long time ago then it would be relatively easy to unblacklist once that was no longer an issue. I will look at whitelisting if I really need it. Thanks for the response. Aircorn (talk) 09:44, 9 August 2018 (UTC)

more than 3000 entries[edit]

related: Talk:Spam_blacklist/Archives/2015-01#Now_restarting_the_original_debate

hi! (ping billinghurst, Beetstra) At user:lustiger seth/sbl log stats/all wikis no hits I started again to make a list of all sbl entries that have 0 hits in all ~900 wikis since they were added to list (but not earlier than 2013, when the sbl log came into existence). The script takes some time (another week probably). Half of the sbl entries (~4800) are checked already. more than 3000 never have been the reason for a blocked edit.
What do you think? Shall we delete those entries (except from the url shorteners) from the list? Advantage: the lesser entries, the clearer the structure. -- seth (talk) 10:28, 10 November 2018 (UTC)

@Lustiger seth: Thank you for this effort. I think most of the non-shortener domains can be removed.
Re 'clearer [the] structure': Would it be possible to 'sort' the list at some point, stuff all the shorteners until now into a 'section', with at the end a mark from 'below here unsorted'. In that case, every year or so we can sort the unsorted into the above 'groups', and it would make clean-up of non-shorteners easier (you can even take them out before your parsing, no need to check whether they were abused or not, we keep 'm anyway if they are still a shortening service). --Dirk Beetstra T C (en: U, T) 07:36, 11 November 2018 (UTC)
@Beetstra: I agree that merging/grouping the shorteners would be reasonable. -- seth (talk) 08:06, 11 November 2018 (UTC)
  • Comment Comment I am interpreting that as you have been running through all wikis Special:log/spamblacklist. On checking some randoms, I see that some listings have come from a Poke, eg. User:COIBot/LinkReports/onlinetripadvisorturkey.com so there has been concerns that have caused addition. Have we done a cross reference against generated XWiki reports as numbers of urls come about from AbuseFilter hits, so if we have additions to blacklist, and generated XWiki reports, I am not certain that we want those removed. Also if we have a regex in place, I am loath to remove those as they have been specifically added from an evidence-base.  — billinghurst sDrewth 22:08, 11 November 2018 (UTC)
    I am also seeing numbers of essay writing domains in the list, and while they have not been spammed, I am not certain that I want them removed. Call me paranoid, or call me nasty! If we are to remove urls, maybe we want to eyeball cull-proposed removals and remove those we would like to keep.  — billinghurst sDrewth 03:47, 12 November 2018 (UTC)
    @Billinghurst: We could easily cut the regexes which have been added in the last # years (2?) from that list. If seth would re-run the script in a year (e.g.) then those with still no hits would come.
    Alternatively, we run a script on those filters, extract all the domains in those ... (heck, I could teach LiWa to read certain filters as prescribed in the settings and extract domains from that ... but that would be a feature that at the earliest I could write next summer; moreover I would love to have LiWa to have access to all special:log/spamblacklist, so I could record attempted additions there as well - attempts to spam would be a welcome addition to the coibot reports ...). --Dirk Beetstra T C (en: U, T) 05:17, 12 November 2018 (UTC)
    (barging in) Maybe you are interested in checking against potential positives of my lists before removing? All of their entries are not collected automatically but are handpicked: spam links on Commons, spamming wikis. Best, --Achim (talk) 14:13, 12 November 2018 (UTC)
    @Achim55: Your list can be loaded into COIBot; if you use IRC, we can give you permissions with COIBot to add these to be monitored per Small Wiki Monitoring Team/IRC or if not, we can give you permission so you can add them to User:COIBot/Poke. @Beetstra: are you thinking of converting to json, or something similar? If not, then I am going to need to get js/css permissions :-/  — billinghurst sDrewth 03:37, 16 November 2018 (UTC)
    @Billinghurst and Achim55: I cannot just convert to json, it is currently not valid json. I will have to go to regular pages and e.g. get template-editor access for COIBot. But that is besides the point. We can also poke that list, and I will give access to achim to poke as well. —Dirk Beetstra T C (en: U, T) 04:24, 16 November 2018 (UTC)
  • Comment Comment I would think that there would be value in at least keeping the old list of removed domains somewhere and having COIBot use that list at least for "monitor", or proactively pushing those in to be monitored.  — billinghurst sDrewth 03:39, 16 November 2018 (UTC)

reading the blacklists ..[edit]

@Lustiger seth: related to your work here .. how do you manage to read ALL spam-blacklist-logs? I thought they were admin only .. ?? If they can be (bot-)read that would be very welcome, I could then build in the capability into LiWa3/COIBot, so the attempts to circumvent the blacklist can be shown in the reports which is very welcome evidence in case of de-blacklisting-requests, as well as for sock-hunting and finding spammers implementing workarounds (spammers attempt one blacklisted domain, and other that are not yet blacklisted .. that is a one-strike-and-you-are-out situation suitable for immediate blacklisting/blocks of the other domains). --Dirk Beetstra T C (en: U, T) 05:26, 14 November 2018 (UTC)

@Beetstra: they used to be, apparently the developers decided they should be open to all logged in users in phab:T64781. — xaosflux Talk 15:49, 14 November 2018 (UTC)
@Xaosflux: I I never understood the initial choice... thanks, I will need to code this into my bots! Thanks! —Dirk Beetstra T C (en: U, T) 17:04, 14 November 2018 (UTC)
Hi!
another related ticket phab:T184483.
I use the replica db at toolserver. -- seth (talk) 21:28, 14 November 2018 (UTC)

josefov.com[edit]



Link that was added in 2008 and looks legitimate to me, it looks like it is official website of Josefov. I spotted it was added by JiriMatejicek@cswiki as legitimate source. I propose removal of this link. --Martin Urbanec (talk) 21:24, 16 November 2018 (UTC)

Comment Comment the request in 2008 https://meta.wikimedia.org/w/index.php?title=Talk:Spam_blacklist&oldid=933918#www.josefov.com.2F_and_pevnostjosefov.wz.cz.2F  — billinghurst sDrewth 23:39, 16 November 2018 (UTC)

thewayoftheninja.org[edit]



Wanted to add this to N's page on Wikipedia after finding a fake link listed, but found this link was blocked. This is the official site for the N series of games. -- TheV360 (talk) 22:05, 20 November 2018 (UTC)

@TheV360: The domain is Declined not globally blacklisted, it is locally blacklisted at enWP only. You will need to address this at w:en:Mediawiki talk:spam-blacklist.  — billinghurst sDrewth 22:18, 20 November 2018 (UTC)

Troubleshooting and problems[edit]

Symbol comment vote.svg This section is for comments related to problems with the blacklist (such as incorrect syntax or entries not being blocked), or problems saving a page because of a blacklisted link. This is not the section to request that an entry be unlisted (see Proposed removals above).

Discussion[edit]

Symbol comment vote.svg This section is for discussion of Spam blacklist issues among other users.

Réclamation pour mon article[edit]

Je souhaite que lien posant problème soit retirer. Vu qu'il est vérifiable, vous donc aussi vérifier. Merci Bob jeci (talk) 10:53, 17 November 2018 (UTC)

@Bob jeci: Hi, you are going to need to provide specific information. We cannot guess where you are editing, and the domain which you are wishing to add.  — billinghurst sDrewth 11:21, 17 November 2018 (UTC)

Informations[edit]

Juste préciser que les seul informations vérifiable sur moi que je voulais ajouter je les copié à partir d'une recherche faite sur Google. Vous rentrez ( N'guessan Enoh Jean cinel) et vous aurez les liens que j'ai proposé. Merxi Bob jeci (talk) 08:15, 23 November 2018 (UTC)

@Bob jeci: It is unclear what you wish to achieve. If you are saying that a search link for Google is blocked, yes, that is the purpose of the blacklisting, and won't be changed. I suggest that you discuss that matter at the wiki where you are looking to add your information.  — billinghurst sDrewth 09:45, 23 November 2018 (UTC)

Smart template for deferring to local wikis?[edit]

Can we have a template, e.g. {{Deferlocal|w|en}} that results in a text 'Defer to English Wikipedia Spam blacklist' (but displaying the target in the local language etc.?) --Dirk Beetstra T C (en: U, T) 11:48, 21 November 2018 (UTC)

@Beetstra: done. Defaults to enWP where parameterless. $1= sister, $2 = lang; $3 = override text link  — billinghurst sDrewth 09:56, 23 November 2018 (UTC)
Comment Comment someone wish to turn it into a translated template. <shrug>  — billinghurst sDrewth 09:58, 23 November 2018 (UTC)

COIBot and the spam blacklist log[edit]

COIBot is currently, in the 'free time' of the report saving module, backparsing the spam blacklist log, one wiki at a time. It turns out that one wiki is a humongous chunk of data, and that the bot spends quite some time before starting to parse reports again. Please be patient while this operation runs. The data is stored with the regular link additions, and the bots will then accessit in the same way as usual.

That likely results in certain parts of COIBot's reporting functions (on wiki and on IRC) to show strange results as some code may not understand how things are stored. I will resolve that later. --Dirk Beetstra T C (en: U, T) 17:53, 1 December 2018 (UTC)

@Beetstra: Are there things that we should not do as they may hinder the process; or things that we should moderate/lessen in doing?  — billinghurst sDrewth 23:48, 1 December 2018 (UTC)
Just be patient with it .. —Dirk Beetstra T C (en: U, T) 00:07, 2 December 2018 (UTC)
@Beetstra: FYI: note that COIBot is writing to the wiki where quickcreate is requested, however, it is not recording its standard analysis from "report xwiki ..." They pass through in time, and are not written up at this point of time.  — billinghurst sDrewth 12:55, 16 December 2018 (UTC)
@Billinghurst: I will have a look this evening. COIBot is running 2 LinkSavers, one parsing blacklists, the other one not. Unfortunately, that is prone to crashes. I presume that currently both are on a blacklist parsing the whole thing. I just hope that the one parsing en.wikipedia is done soon, but there are hellish months in the history of that (spambots hitting thousands of times an hour, back in 2015, see e.g. https://en.wikipedia.org/w/index.php?title=Special:Log/spamblacklist/91.200.12.79&action=edit&redlink=1). --Dirk Beetstra T C (en: U, T) 13:25, 16 December 2018 (UTC)
@Billinghurst: bot was confused .. I restarted the LinkSaver that should be saving. It borked (nothing you can solve from IRC .. unfortunately). Just to illustrate, the blacklist parser spent the last 13 1/2 hours parsing the 2nd of May 2015 ... --Dirk Beetstra T C (en: U, T) 17:31, 16 December 2018 (UTC)