Jump to content

Talk:Spam blacklist: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Latest comment: 6 years ago by 2.26.232.41 in topic Proposed removals
Content deleted Content added
→‎fivebooks.com: Wrong formatting
Line 244: Line 244:
::::::Okay, you have added your comment. I will again set this to closed and to be archived. &nbsp;— [[user:billinghurst|billinghurst]] ''<span style="font-size:smaller">[[user talk:billinghurst|sDrewth]]</span>'' 00:24, 16 November 2017 (UTC)
::::::Okay, you have added your comment. I will again set this to closed and to be archived. &nbsp;— [[user:billinghurst|billinghurst]] ''<span style="font-size:smaller">[[user talk:billinghurst|sDrewth]]</span>'' 00:24, 16 November 2017 (UTC)
{{Section resolved|1=&nbsp;— [[user:billinghurst|billinghurst]] ''<span style="font-size:smaller">[[user talk:billinghurst|sDrewth]]</span>'' 00:24, 16 November 2017 (UTC)}}
{{Section resolved|1=&nbsp;— [[user:billinghurst|billinghurst]] ''<span style="font-size:smaller">[[user talk:billinghurst|sDrewth]]</span>'' 00:24, 16 November 2017 (UTC)}}

===fivebooks.com===
* {{Link summary|fivebooks.com}}

This domain is on the global blacklist, and I don't think it should be. It seems it was blacklisted in May 2011, as a spammer. The site is made up of interviews of well-known people, followed by their recommendations for books on the topic which is their specialty. I tried to add a quote from the Shakespeare scholar Stanley Wells to the King Lear page, and found that this site was blacklisted.

[[Special:Contributions/2.26.232.41|2.26.232.41]] 13:42, 16 November 2017 (UTC)


== Troubleshooting and problems ==
== Troubleshooting and problems ==

Revision as of 13:43, 16 November 2017

Shortcut:
WM:SPAM
WM:SBL
The associated page is used by the MediaWiki Spam Blacklist extension, and lists regular expressions which cannot be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any Meta administrator can edit the spam blacklist; either manually or with SBHandler. For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.

Proposed additions
Please provide evidence of spamming on several wikis. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Please check back after submitting your report, there could be questions regarding your request.
Proposed removals
Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.
Other discussion
Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
#wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.
Whitelists
There is no global whitelist, so if you are seeking a whitelisting of a url at a wiki then please address such matters via use of the respective Mediawiki talk:Spam-whitelist page at that wiki, and you should consider the use of the template {{edit protected}} or its local equivalent to get attention to your edit.

Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.


Completed requests are marked as {{added}}/{{removed}} or {{declined}}, and are generally archived quickly. Additions and removals are logged · current log 2024/06.

SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 7 days.

Proposed additions

This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived.

images.google.(tld)/imgres?



Im suggesting to add the preview pages from Google Images Search, as beeing simmiliar to their normal web searches redirector. Surprisingliy it seems like this has not been discussed yet, though there are at least 800 such links on dewiki, 600 on enwiki and more than 1000 on commons. --Nenntmichruhigip (talk) 12:53, 15 September 2016 (UTC)Reply

@Nenntmichruhigip: if there are that many links on those wikis, and the sites are neither blacklisted, nor the links removed, then it is not the place of meta to impose its own opinion. If you believe that we should continue the discussion the please raise the issue at enWP, deWP and Commons and point the users here to discuss.  — billinghurst sDrewth 16:26, 16 September 2016 (UTC)Reply
I've been directed here from dewiki, and don't know where a suitable place on enwiki and commons would be. --Nenntmichruhigip (talk) 13:36, 17 September 2016 (UTC)Reply
The occurences on dewiki are cleaned up by now. On commons there already is an abuse filter catching new additions since two months ago, but afaict no ongoing cleanup of existing ones, despite quite some copyvios. --Nenntmichruhigip (talk) 07:33, 27 September 2016 (UTC)Reply
Until these other wikis address the matter here, it may be best to pursue a local blacklist addition at deWP.  — billinghurst sDrewth 10:19, 27 September 2016 (UTC)Reply
Hi!
Those links can be used as sbl-circumvention.
As we normally blacklist all possible sbl-circumventions globally here at meta, in my opinion the first place to discuss blacklisting of that domain is here. -- seth (talk) 22:04, 27 September 2016 (UTC)Reply
Agree, though we are not meant to be setting the overarching link policy without direct consultation. To further assist, I have put a bot clean up request to Commons. I have also copied over the filter from Commons to enWP to look at the new additions to monitor, and maybe start a conversation there.  — billinghurst sDrewth 23:11, 10 October 2016 (UTC)Reply

g.co

See w:en:MediaWiki_talk:Spam-blacklist#google.co.in_shortener, reported by User:Ravensfire



This is ugly, this is one of google's link shorteners. Used across our projects a lot. However, it is prone to abuse as usual for shorteners. It is currently used in w:en:Akasa_Singh, where it redirects to a plain search result. If that is possible.... --Dirk Beetstra T C (en: U, T) 13:52, 29 November 2016 (UTC)Reply

Standard linksearch does not give a lot .. but is informative. --Dirk Beetstra T C (en: U, T) 13:54, 29 November 2016 (UTC)Reply
It appears that the url will indicate which google page/service the link is for. g.co/maps/ for maps, doodle for doodle, etc. g.co/kgs/ appears to be for searches, so if there is a need to keep g.co available for some services, blocking just the search may be useful. Ravensfire (talk) 15:37, 29 November 2016 (UTC)Reply
For such a change of a commonly used url, I would have the expectation that (some|the) wikis would block it first, and we would follow. I do not see it blocked.  — billinghurst sDrewth 10:58, 3 December 2016 (UTC)Reply
It is a url shortener .. they get blanket blacklisted on meta, not through local communities. We declined this earlier because people were arguing that it could only be used for google maps. It turns out that that is not the case, it has been (ab)used to link to search engine results (which is discouraged on some wikis, to say the least). --Dirk Beetstra T C (en: U, T) 03:32, 4 December 2016 (UTC)Reply

Ping. Still think that this is/can be abuse. --Dirk Beetstra T C (en: U, T) 15:39, 5 July 2017 (UTC)Reply

wikiwand.com



This is a clear wikipedia mirror, but also used as citation in wikipedia, e.g. [1] and frwiki and eswiki. enwiki seems to be clean. should the domain be blacklisted globally or locally only? -- seth (talk) 19:39, 5 December 2016 (UTC)Reply

@Lustiger seth: Hmm .. this certainly should not be used as a reference, but this site does have it's own article on many Wikis (w:en:Wikiwand and interwikis). I don't think this warrants blacklisting more than en.wikipedia.org (which should also not be used as a reference; en.wikipedia would not blacklist based on being unreliable alone, abuse is needed as excuse for blacklisting). Is the situation 'bad' (I don't see reference-use on enwiki). --Dirk Beetstra T C (en: U, T) 13:09, 8 December 2016 (UTC)Reply
hi!
I guess at enwiki somebody cleaned up already (that's what I meant with "enwiki seems to be clean"). (Maybe there's an edit filter or a bot coping with the links in main namespace?) But in other wikis there are still some links left.
I agree in that point that I don't see intended abuse. However, I see violation of our rules and want to stop that. Possibillities I see are: SBL, edit filter, a cleanung up and kindly inform bot. The SBL would be an easy and fast option. The most comfortable solution is the bot I guess.
I guess, I start with an SBL entry at dewiki. An in a few hundred yea^W^W^W^Wsome point in the future I'll implement a more intelligent solution ... -- seth (talk) 23:03, 12 December 2016 (UTC)Reply
@Lustiger seth: this again seems to boil down to a proper rewrite of the spam blacklist extension - to be more edit-filter like. --Dirk Beetstra T C (en: U, T) 03:42, 13 December 2016 (UTC)Reply
Just noticed this conversation. I've removed quite a few links to this site on enwiki and do weekly sweeps of the http and https links. I hate to call it "abuse" as I'm almost sure 100% of the links were added by mistake rather than through deliberate spamming. They do clearly identify as a mirror; I'm far more concerned with the handful of mirrors that go out of their way to obfuscate the origins of their content. Kuru talk 22:27, 22 December 2016 (UTC)Reply
@Kuru: those might qualify for blacklisting as well. --Dirk Beetstra T C (en: U, T) 03:44, 12 February 2017 (UTC)Reply
  • Comment Comment COIBot as of today says: COIBot> 1755 records; Top 10 wikis where wikiwand.com has been added: w:en (603), w:de (86), w:fr (82), w:pt (76), w:es (73), w:hi (73), w:it (34), w:ru (31), commons (28), w:hy (20). I have set the domain to monitor, and recast for a link check. It is blacklisted at deWP and zhWP.  — billinghurst sDrewth 08:08, 25 December 2016 (UTC)Reply
  • 1873 records; Top 10 wikis where wikiwand.com has been added: w:en (639), w:pt (89), w:fr (88), w:de (86), w:es (79), w:hi (79), w:it (38), w:ru (32), commons (30), w:kn (28).  — billinghurst sDrewth

webcache.googleusercontent.com/search



  • webcache.googleusercontent.com/search

Google Cache. Easy to use blacklist circumventer, links are also very short-lived. Incidence rate may grow as Google Chrome now provides an option to view cached pages when connecting to a site fails. Train2104 (talk) 05:30, 4 March 2017 (UTC)Reply

Let us see what COIBot can dig up.  — billinghurst sDrewth 07:20, 4 March 2017 (UTC)Reply

e-camm.ga





e-camm.ga redirects to wlpreview.dnxlive.com, both URL are porno and morbo. --VR0 (talk) 17:16, 27 March 2017 (UTC)Reply

Only one addition to a (now deleted) page on es.wikipedia for the former of the two. But the editor was also on en.wikipedia, where they are blocked for spamming after creating one article with



Maybe a pattern. --Dirk Beetstra T C (en: U, T) 03:32, 28 March 2017 (UTC)Reply

itun.es



Shortcut for itunes.apple.com. As the latter is a commercial site, I don't see why we should obscure these. It is from itun.es/us/iGunhb absolutely not clear even whether it is to a correct item. And though this particular is free, probably for many you have to pay. Not sure if this is needed. --Dirk Beetstra T C (en: U, T) 17:08, 5 July 2017 (UTC)Reply

Is there evidence of abuse or potential abuse? If it just a shortened url with no abuse, I am not sure why we would want to. There are numerous "internal" url redirects, if it is just a little shorter, and not abused, what do we care which?—The preceding unsigned comment was added by Billinghurst (talk)
Is linking to this (or its master site) not by definition spam/promotion .. a lot of it will be single edits per editor (not a campaign). I am, on Wikipedia, a great opponent of url shorteners .. you don't know what you are linking to (and at least en.wikipedia 'forbids' using shorteners). --Dirk Beetstra T C (en: U, T) 17:49, 7 July 2017 (UTC)Reply
My comment was more related to the use of the shortener compared with the full url; not the provision of a link in itself. If a link is discouraged or banned, then of course we should ban the shortener. If the link is neither banned nor discouraged, why would we wish to prevent a non-abused linking process that just inhibits and somewhat confuses users, or prevents them from editing. If a specific wiki itself has a rule where a link addition is contrary to that local rule, they can and should use their blacklist. I don't see a requirement for a global ban unless there is broader problem.  — billinghurst sDrewth 01:12, 8 July 2017 (UTC)Reply

Godchecker.com



Per en.wikipedia's WikiProject Spam remarks. Cross-wiki spam, ánd useless. --Dirk Beetstra T C (en: U, T) 09:39, 2 August 2017 (UTC)Reply

See also w:Wikipedia:Administrators'_noticeboard/Archive291#Is_godchecker.com_blacklisted.3F_If_not.2C_how_to_make_it_happen.3F. --Dirk Beetstra T C (en: U, T) 09:43, 2 August 2017 (UTC)Reply

It is used at numerous sites without blacklisting. How do you see that it sits within the criteria for meta admins for us to add?  — billinghurst sDrewth 03:01, 5 August 2017 (UTC)Reply
The xwiki report shows it has been added by administrator and other trusted users on other wikis. If it problematic at enWP, it may need to be locally managed.  — billinghurst sDrewth 12:49, 5 August 2017 (UTC)Reply

Abbywinters.co



Sustained cross-wiki spam campaign of replacing abbywinters.com with abbywinters.co. See e.g. [2] and [3], [4]. Tgeorgescu (talk) 03:40, 26 October 2017 (UTC)Reply

@Tgeorgescu: At this point, this looks as though it can be managed at enWP.  — billinghurst sDrewth 21:28, 26 October 2017 (UTC)Reply

jimiwriter



Here are some domains they are spamming on Swahili Wikipedia.





















I've emailed off-wiki evidence to functionaries. Enwp volunteers requested it be listed here as they appear to be affecting several wikis. Bri (talk) 06:39, 11 November 2017 (UTC)Reply

@Bri: Added Added to Spam blacklist. --Dirk Beetstra T C (en: U, T) 07:24, 11 November 2017 (UTC)Reply

dw-inductionheating.com



Spammed over VPNs and by socks. MER-C (talk) 07:43, 12 November 2017 (UTC)Reply

@MER-C: Added Added to Spam blacklist (see [5]). --Dirk Beetstra T C (en: U, T) 08:16, 12 November 2017 (UTC)Reply

Proposed additions (Bot reported)

This section is for domains which have been added to multiple wikis as observed by a bot.

These are automated reports, please check the records and the link thoroughly, it may report good links! For some more info, see Spam blacklist/Help#COIBot_reports. Reports will automatically be archived by the bot when they get stale (less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot).

Sysops
  • If the report contains links to less than 5 wikis, then only add it when it is really spam
  • Otherwise just revert the link-additions, and close the report; closed reports will be reopened when spamming continues
  • To close a report, change the LinkStatus template to closed ({{LinkStatus|closed}})
  • Please place any notes in the discussion section below the HTML comment

COIBot

The LinkWatchers report domains meeting the following criteria:

  • When a user mainly adds this link, and the link has not been used too much, and this user adds the link to more than 2 wikis
  • When a user mainly adds links on one server, and links on the server have not been used too much, and this user adds the links to more than 2 wikis
  • If ALL links are added by IPs, and the link is added to more than 1 wiki
  • If a small range of IPs have a preference for this link (but it may also have been added by other users), and the link is added to more than 1 wiki.
COIBot's currently open XWiki reports
List Last update By Site IP R Last user Last link addition User Link User - Link User - Link - Wikis Link - Wikis
vrsystems.ru 2023-06-27 15:51:16 COIBot 195.24.68.17 192.36.57.94
193.46.56.178
194.71.126.227
93.99.104.93
2070-01-01 05:00:00 4 4

Proposed removals

This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.

Remember to provide the specific domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as {{removed}} or {{declined}} and archived.

See also /recurring requests for repeatedly propo sed (and refused) removals.

Notes:

  • The addition or removal of a domain from the blacklist is not a vote; please do not bold the first words in statements.
  • This page is for the removal of domains from the global blacklist, not for removal of domains from the blacklists of individual wikis. For those requests please take your discussion to the pertinent wiki, where such requests would be made at Mediawiki talk:Spam-blacklist at that wiki. Search spamlists — remember to enter any relevant language code

tezukainenglish.com



Hi all. I wanted to write an article about an anime, but the tezukainenglish.com website from where I wish to use some references is on blacklist. Apparently it was added here. If the cause does not hold anymore, can someone please remove it? Thanks, Whitepixels (talk) 19:15, 14 June 2017 (UTC)Reply

It is a fan site, and one that was spammed. The Wikipedias look sideways at fan sites; and we look sideways at spam. I would suggest the wiki where you wish have the article,and see if they can whitelist the url that you wish to utilise.  — billinghurst sDrewth 06:15, 15 June 2017 (UTC)Reply
Thanks for your reply. Whitepixels (talk) 11:36, 17 June 2017 (UTC)Reply
@Billinghurst: I see no sign of spamming in the report; link additions are either made as part of major content expansions or done by experienced users. And most Wikipedias do not look sideways at fansites when those are the best available content for users who don't speak Japanese. You can see that from the link still being in the dewiki Tezuka article, for example (having been added more than 3 years ago). --Tgr (talk) 14:42, 17 June 2017 (UTC)Reply
@Tgr: It would have been spammed as I quickcreated the report, and that would have been in response to spambots, and repeated spambots activity; and a review of whether it was being used at wikis. I don't remember anything particular about the particular case. Where a site has been added for spambot activity to sites it has been the practice to ask users to seek whitelisting as the means to graduate our way out of blacklisting. 1) It shows that communities want the link, and 2) that sites can have it and not be spammed with it. Re sideways, w:Wikipedia:External links is my guide; it is neither a ringing endorsement nor an invitation to add fan sites.  — billinghurst sDrewth 14:56, 17 June 2017 (UTC)Reply

rns.online



This is the website for the Rambler News Service, which is a news agency. Rambler is sort of the Yahoo of Russia. Rambler creates its own news and runs articles from other agencies. It's quite useful as a source. I'm not sure why it's blocked unless the entire .online domain is blocked. Thanks for your time. Wikimandia (talk) 05:29, 9 July 2017 (UTC)Reply

@Wikimandia: I don't find a global blacklisting for the domain or urls like it, nor a simple block for online top level domain. That said across all the wikis a grep shows me online\b' found in 971 rules, of which only two are global and more explicit so not the cause, similarly 150 include use of the term online in global blacklist, though a quick scan indicates that they are again more explicit rules. 15 uses of rns\. though none are global, and all are more specific. Are you certain that it is a global rule, rather than a local rule? Otherwise we are going to see and know more, eg. where are you attempting addition?  — billinghurst sDrewth 05:53, 9 July 2017 (UTC)Reply
Oh it could be local - I will look at that. Local makes sense, since I imagine it's used on the Russian wikipedia. Thanks. Wikimandia (talk) 05:59, 9 July 2017 (UTC)Reply

@Billinghurst: Now this one was complex. Just to note, you can feed COIBot the whole link in 'wherelisted', and that will give you the answer. This rule turned out to be so complex that even a simple whitelisting of the domain did not do the trick. --Dirk Beetstra T C (en: U, T) 07:21, 9 July 2017 (UTC)Reply

Thanks Beetstra. I had checked the domain, though didn't have a confirmed wiki, let a lone an url.  — billinghurst sDrewth 07:42, 9 July 2017 (UTC)Reply
@Billinghurst: It also took me some time, next time we should be faster in asking 'what link were you actually trying to add', so we can throw it in IRC in the general direction of COIBot. --Dirk Beetstra T C (en: U, T) 08:11, 9 July 2017 (UTC)Reply

 nothing to do local issue at enWP  — billinghurst sDrewth 07:43, 9 July 2017 (UTC)Reply

best-poems.net



Hello. Why is website Best Poems (best-poems.net) in your blacklist. We have no links from you while we have hundred of links from us to you. All our authors biographies refer directly to you like http://www.best-poems.net/adela_florence_nicolson/index.html. Apparently it has been added by someone who is a concurrent or someone who need to explain why if we have no links from you directed to our website. Can someone please remove it? or give us any details on the reason for that. Thanks. — The preceding unsigned comment was added by 41.102.191.58 (talk)

 not globally blocked. This is a local block at English Wikipedia, you should ask there.  — billinghurst sDrewth 12:32, 4 October 2017 (UTC)Reply

History-of-China.com



Seems to have some useful information 🛈 on the Mongol Empire period in China (the Yuan Dynasty), no idea why it's blacklisted. 🤔 --Donald Trung (Talk 🤳🏻) (My global lock 😒🌏🔒) (My global unlock 😄🌏🔓) 09:37, 26 September 2017 (UTC)Reply

 not globally blocked. Please address your concerns to English Wikipedia where there is a local block.  — billinghurst sDrewth 12:19, 4 October 2017 (UTC)Reply
I actually need it for Dutch Wikipedia, well that's good news. --Donald Trung (Talk 🤳🏻) (My global lock 😒🌏🔒) (My global unlock 😄🌏🔓) 09:34, 5 October 2017 (UTC)Reply

www.charlesproxy.com



When I add "charlesproxy.com/buy/eula/" to references list on API simulation tools comparison I get a message "The following link has triggered a protection filter: lesproxy.com". Charles Proxy is a genuine tool used by many software testers and developers and the list of tools I am working on would be incomplete without it. — The preceding unsigned comment was added by Wojtek-tp (talk)

@Wojtek-tp: Removed modified regex. It would seem that it was collateral damage of blocking another site. I have amended the regex.  — billinghurst sDrewth 12:28, 4 October 2017 (UTC)Reply

www.yupptv.com



The YuppTV provides streaming television, has a wikipedia article, and should have it official website removed from wikipedia's blacklist.— The preceding unsigned comment was added by 67.53.214.86 (talk)

 not locally blocked. This is an issue that you will need to address to English Wikipedia.  — billinghurst sDrewth 12:21, 4 October 2017 (UTC)Reply

admitad.com



Hello. I want to write an article about a global affiliate network, but admitad.com is on Wikipedia blacklist. This is an official site of this network and it should be listed in the article about this organization. We should have its official website removed from wikipedia's blacklist.— The preceding unsigned comment was added by Nat.johnson (talk)

 seek whitelist at wiki of interest. Unfortunately the domain is abused on an incredible scale. If you wish to use the url in an article, then at the wikipedia where you are wishing to add the article then you will need to apply to that wiki to whitelist the whole domain, or a part of the url.  — billinghurst sDrewth 23:20, 13 October 2017 (UTC)Reply

tradingview.com



Hello. Why is tradingview.com on your global blacklist? Charts are essential in economics, I don't see why it's seen as spam. The content is well related to Wikipedia pages on the financial markets.— The preceding unsigned comment was added by 203.77.239.2 (talk)

No one is arguing that charts are or are not part of economics. It was added to the blacklist as it was being spammed to the wikipedias. Have you looked at the respective wikis linking policies? Would you consider the addition of the links as in the relevant reports as spam or not? For the site itself, do you have an interest or connection to the site?  — billinghurst sDrewth 09:28, 24 October 2017 (UTC)Reply
I am a user of the site and wanted to add a reference when I got a screen saying it was black-listed. I would think that backing up statements in articles with charts / source data from this site, allows Wikipedia users to verify that the information given is supported by a reliable source. That’s why I wondered.
In any case, you asked me to look at the reports, so I did. I noticed a link was added to a Vietnamese article on October 18th (item 49 in the additions list) and after that the site was blacklisted on Vietnamese Wikipedia. After adding 1 link. I would think blacklisting would be a last resort. I have also seen edits that added information to an article with a reference to tradingview as a source. Example addition 59 which added the exchange rate with a source. It was reverted as spam. Those links are not trying to sell a product or service and I did not see existing links getting replaced.
I myself am certainly not looking to spam and I would like to contribute to Wikipedia but now I cannot use one of the sources. How about unlisting the site while monitoring any spamming and if it happens again you can always flip the switch and re-list them? — The preceding unsigned comment was added by 203.77.239.2 (talk)
@MER-C: comment?  — billinghurst sDrewth 11:36, 26 October 2017 (UTC)Reply
Hi, just a friendly reminder, I was wondering if you have an update? — The preceding unsigned comment was added by 203.128.92.43 (talk)
Given the number of IPs adding links to this site, I would only entertain requests from established editors. MER-C (talk) 08:11, 3 November 2017 (UTC)Reply
I would suggest that you seek local whitelisting at the wiki of interest, and probably identify a url path that is outside the problematic edits.  — billinghurst sDrewth 12:10, 3 November 2017 (UTC)Reply
Hi guys. Not sure what is happening, but I wanted to add additional information to the English article on TradingView and I see they are blacklisted? I use their site and I know their platform. How can I expand the article now? Can you delist them? Asher999 (talk) 16:07, 10 November 2017 (UTC)Reply
As I noted above, seek local whitelisting for a non-problematic path. If it is enwiki, then start at en:Mediawiki talk:Spam-whitelist, other wikis may be there or their admin's noticeboard.  — billinghurst sDrewth 22:34, 10 November 2017 (UTC)Reply
Ok well it´s a lot of extra work to go through the whitelisting process when I am just looking to contribute to the community. Bummer! Something else: I see many broken reference links to different websites (page not found / cant find server / 404 error) under financial market related articles, which are not helping anybody. I use and like the tradingview site and see them as a good source. Would replacing any of those broken links mean going through whitelisting requests as well? Asher999 (talk) 12:09, 14 November 2017 (UTC)Reply
It is not appropriate for me to give advice about one of the wikipedias here about their local policies about link replacement. There may even be a project related to financial or business sites that would be worth mentioning such a matter and seeking guidance.  — billinghurst sDrewth 12:26, 15 November 2017 (UTC)Reply

babelstone.co.uk



Today on Durch Wikipedia I tried to save a NON-PROMOTIONAL article about Tangut inscriptions and I kept getting an error message on my wireless phone, as this is one of the sources I used 3 (three) times I would assume that I got the error message because someone probably claimed that I have "an obvious COI" with BabelStone because I asked hin to upload some images from a museum on Wikimedia Commons. Every time I used this link 🔗 (which as far as I can see hasn't been removed yet from other articles) was to source content, this doesn't make it "a spamlink" and w:en:User:BabelStone never paid me or even asked me to place the link 🔗 for him, this must be a mistake and my sockpuppetry was about insulting a person, I've never made a single vandalistic mainspace edit in my life. --Donald Trung (Talk 🤳🏻) (My global lock 😒🌏🔒) (My global unlock 😄🌏🔓) 09:14, 3 November 2017 (UTC)Reply

I am not going to request local whitelisting because I didn't add it to those other wiki's, and don't make every Wikimedia project collateral damage because I wrote a draft on a translation of w:en:Andrew West (linguist) but I can easily demonstrate that all the link additions was for content attribution. --Donald Trung (Talk 🤳🏻) (My global lock 😒🌏🔒) (My global unlock 😄🌏🔓) 09:16, 3 November 2017 (UTC)Reply
 not blacklisted @Donald Trung: Not blacklisted anywhere that I can see. Error messages will tell you when the url that you are looking to add is blacklisted. Please don't guess and jump to a conclusion. You make more noise that a rusty gate, so I think that it is time to do be reflective of your approach and your knowledge gaps; you jump at shadows and make horrid assumptions.  — billinghurst sDrewth 12:06, 3 November 2017 (UTC)Reply
@Donald Trung: Your behaviour here starts to become rather tendentious (see also the request above for #History-of-China.com). It may not matter at all whether what you save is 'NON-PROMOTIONAL' .. you obviously do not have a single understanding why things get blacklisted, and your edits here and elsewhere in this area is plainly assuming bad faith. --Dirk Beetstra T C (en: U, T) 14:38, 15 November 2017 (UTC)Reply

waremakers.com



This domain is on the global blacklist and I request it be removed. I am unsure why this domain was originally added but suspect it has to do with the domain being linked to multiple times over 2 days in 2016: [6]. I am associated with the owner of the domain and was not even aware of this blacklisting until yesterday.

After some investigation I found the linkreport above yesterday. Checking company records, in August 2016 a very keen student interned for the owner. It seems this person added content found on the domain as a Wikipedia source no less than 13 times. This makes the blacklisting understandable.

I ask the domain be removed at this time as this undesired behavior was solely the work of an overly keen intern spread out over two days. The domain is home to a reputable business covered by media such as Financial Times, The Guardian and Forbes. The business produces journalistic content that may be valuable to use as a future source. Henk Sluipert 14:42, 31 October 2017 (UTC)Reply

I see no reason to remove this from the blacklist, it does not look evident that this would be used by encyclopaedias as a reliable or needed source. Noting that I blacklisted at the time due to spam, so I do not wish to be seen as the person to permanently decline. Local whitelisting may be an option if you can persuade a wiki that your material is of sufficient value.  — billinghurst sDrewth 07:38, 5 November 2017 (UTC)Reply
@Henksluipert:  Declined. As per User:Billinghurst, I see no reason to remove. Whatever the reason behind, this looks like typical reference spamming, and I do not see that this is going to be significantly used (not much beyond what specific whitelisting can handle). Dirk Beetstra T C (en: U, T) 11:53, 5 November 2017 (UTC)Reply
@Beetstra: @Billinghurst: I don't know if it possible to appeal? This was indeed reference spamming but it was not intended - or even known by the domain owner until just a few days ago. The owner is deeply sorry about what happened and feels that if wikis have to ask for page-specific whitelisting it will mean the site will never be used as a reference. The editorial section of the site in question works with the skilled craftsmen industry and produces unique content based on research within the industry. Disseminating knowledge to a wider audience is a key part of the activities of the domain owner. Here is just on example: domain-in-question/the-post/leather-tanning-chrome-or-vegetable. A syndicated copy of this article ranks #1 on a Google search for keywords "chrome leather tanning". I don't want to spam this blacklist-page and I won't waste your time by appealing again, but I kindly ask that the domain is delisted. After this experience, the domain owner has put in place strict staff policies to make sure reference spamming, or any other spamming, on Wikipedia will never happen again. Henk Sluipert 15:20, 6 November 2017 (UTC)Reply
@Henksluipert: Please note that blacklisting is not a judgement on the quality or integrity of a website, it is a means to control the addition of a link; these are definitively not punitive listings. Numbers of quality sites are added globally or locally at wikis, and then controlled addition allowed through local whitelists, as the information does not meet our guidance, or the addition is problematic. Personally, I don't see a compelling reason to remove the domain from the blacklist.

Of course it is possible to appeal, or maybe more accurately call that making a representation. You are doing so expressing your opinion to the community, and this site operates on community consensus. If there is a consensus of opinion here to remove it from the blacklist, then it will be removed. We can leave the discussion open for a week to a month allowing for that opinion. — billinghurst sDrewth 01:20, 7 November 2017 (UTC)Reply

@Billinghurst: Thank you. I appreciate it. My only aim here is to make it possible/easier for future wikis to potentially use the site as a reference. If nobody ever does that, then so be it. But the owner feels it is a little harsh keeping the entire site on the global blacklist due to the brief actions of an intern more than a year ago. A key motivation for the owner's business is to supply industry transparency and well-researched, unbiased material about "quality". Consequently it is disheartening to be blacklisted from being used as a reference by the world's number 1 source for unbiased information.

I hope other wikis will support the whitelisting of the domain. Thank you. Henk Sluipert 13:21, 8 November 2017 (UTC)Reply

@Henksluipert: Umm, now I am admittedly a harsh marker but from my looking at your pages, the site could neither be considered a reliable nor an authoritative source, and I hardly think that many would sanction urls to the site. The fact is that the site was spammed here. The fact is that you are currently writing a paid-editing article at English Wikipedia and have not declared either that you are a paid editor, or that you have a conflict of interest. We are an encyclopaedia trying to provide the best information to users free of bias and commercial conflict, and plaintive cries of people "missing out" is a strawman. I have left instruction for you at your enWP talk page about addressing those local matters.  — billinghurst sDrewth 09:33, 9 November 2017 (UTC)Reply
@Billinghurst: Sorry, reopening this to add one more comment: As written to you yesterday on my enWP talk page I have not been paid anything whatsoever to add a page about Waremakers to Wikipedia. And I very much did declare the potential conflict of interest on that same page. I have not said anything about the world "missing out" if the domain continues to be blacklisted - merely pointed out that they actually supply well-researched industry information. I have previously given an example of a piece about leather tanning being the highest ranking link on the subject on Google. Another article on this site involving much research (/the-post/how-the-luxury-industry-makes-a-fortune-through-deception) was recently picked up by Huffington Post. I know exactly what Wikipedia is and have done my utmost to respect all the rules and to write a bias-free little page on this business. I don't understand the reaction I have been met with here. 15:41, 15 November 2017 (UTC)Reply
Okay, you have added your comment. I will again set this to closed and to be archived.  — billinghurst sDrewth 00:24, 16 November 2017 (UTC)Reply
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment.  — billinghurst sDrewth 00:24, 16 November 2017 (UTC)Reply

fivebooks.com



This domain is on the global blacklist, and I don't think it should be. It seems it was blacklisted in May 2011, as a spammer. The site is made up of interviews of well-known people, followed by their recommendations for books on the topic which is their specialty. I tried to add a quote from the Shakespeare scholar Stanley Wells to the King Lear page, and found that this site was blacklisted.

2.26.232.41 13:42, 16 November 2017 (UTC)Reply

Troubleshooting and problems

This section is for comments related to problems with the blacklist (such as incorrect syntax or entries not being blocked), or problems saving a page because of a blacklisted link. This is not the section to request that an entry be unlisted (see Proposed removals above).

derefer.unbubble.eu deblock





This authority is used 24.923 times in main space in dewiki!. It is used to clean up Special:Linksearch from known dead links, by redirecting them over this authority. It is hard to find a better solution for this task. --Boshomi (talk) 16:38, 24 July 2015 (UTC) Ping:User:BillinghurstBoshomi (talk) 16:49, 24 July 2015 (UTC)Reply

Please notice Phab:T89586, while not fixed, it is not possible to find the links with standard special:LinkSearch. in dewiki we can use giftbot/Weblinksuche instead.--Boshomi (talk) 18:04, 24 July 2015 (UTC)Reply
afaics derefer.unbubble.eu could be used to circumvent the SBL, is that correct? -- seth (talk) 21:30, 24 July 2015 (UTC)Reply
I don't think so, the redircted URL is unchanged, so the SBL works like the achive-URLs to the Internet Archive. --Boshomi (talk) 07:44, 25 July 2015 (UTC)Reply
It is not a stored/archived page at archive.org, it is a redirect service as clearly stated at the URL and in that it obfuscates links. To describe it in any other way misrepresents the case, whether deWP uses it for good or not. We prevent abuseable redirects from other services due to the potential for abuse. You can consider whitelisting the URL in w:de:MediaWiki:spam-whitelist if it is a specific issue for your wiki.  — billinghurst sDrewth 10:09, 25 July 2015 (UTC)Reply
what I want to say was that the SBL-mechanism works in the same way like web.archive.org/web. A blocked URL will be blocked with unbubble-prefix to the blocked URL.--Boshomi (talk) 12:54, 25 July 2015 (UTC)Reply

non-ascii are not blocked?



I saw \bказино-форум\.рф\b in the page, so it's supposed to be blocked. However, I can link it: http://казино-форум.рф It seems like all non-ascii links will be able to avoid blocking.

In Thai Wikipedia (where I am an admin), there are a lot of Thai URLs that we want to put them in the local blacklist but we couldn't because of the very same reason. --Nullzero (talk) 17:42, 18 February 2016 (UTC)Reply

This should go to Phab: quickly - that is a real issue. --Dirk Beetstra T C (en: U, T) 05:52, 21 February 2016 (UTC)Reply
@Beetstra: Please see Phab:T28332. It seems that you need to put \xd0\xba\xd0\xb0\xd0\xb7\xd0\xb8\xd0\xbd\xd0\xbe-\xd1\x84\xd0\xbe\xd1\x80\xd1\x83\xd0\xbc\.\xd1\x80\xd1\x84 (without \b) instead of \bказино-форум\.рф\b --Nullzero (talk) 20:00, 21 February 2016 (UTC)Reply
*sigh* somehow the workaround doesn't work with Thai characters, so I don't know if \xd0\xba\xd0\xb0\xd0\xb7\xd0\xb8\xd0\xbd\xd0\xbe-\xd1\x84\xd0\xbe\xd1\x80\xd1\x83\xd0\xbc\.\xd1\x80\xd1\x84 will actually work or not. Please try it anyway... --Nullzero (talk) 20:24, 21 February 2016 (UTC)Reply

Are these global, or local

Regarding my log of blocks, I have over the last days blocked almost 50 IPs whose only edits are hitting the blacklist (there is a related filter on en, that gets some hits). It makes the logs unreadable. My questions: a) is this a global problem, and b) if so, can we have a bot that globally blocks these IPs on sight (with withdrawal of talkpage access) so we de-clutter the logs. I block these for 1 month at first, as soon as they attempt to use one of the typical domains/links or when they add links to typical pages they tend to try. Is it feasible to have bot that gets access to these records and locks them globally? --Dirk Beetstra T C (en: U, T) 06:55, 19 October 2016 (UTC)Reply

Note: seeing this (meta admin eyes only), it is global, there are many IPs with the same MO. So the request simplifies: can we have a bot that globally checks for this and lock the IPs on sight so we declutter the logs. For en.wikipedia, we have en:Template:spamblacklistblock as block reason/talkpage template for them. --Dirk Beetstra T C (en: U, T) 06:58, 19 October 2016 (UTC)Reply

@Beetstra: that is stewards' territory. There was no such bot when I had that hat, and if one existed now you would see it with the steward's right. So, no would be my guided guess  — billinghurst sDrewth 03:43, 24 February 2017 (UTC)Reply
@Billinghurst: On en.wiki there is now a bot that blocks them, but maybe that should be extended (I notice that the list of regexes of that bot needs updating). --Dirk Beetstra T C (en: U, T) 07:47, 24 February 2017 (UTC)Reply
Yes, I know it. I am saying that this is a matter for stewards to globally block, rather than locally block IP addresses which is an admin task.  — billinghurst sDrewth 09:06, 24 February 2017 (UTC)Reply
Note (@stewards who see this): I do have the impression that it actually helps. One of the domains that was hammered hard in the past has now disappeared from the lists, and although new domains pop up (which hence do not result in spamblackistblocks), and we still get daily hits, it is much less than before. Maybe worth to try this globally. --Dirk Beetstra T C (en: U, T) 03:36, 5 March 2017 (UTC)Reply
It might be worth doing. It's probably a bit more difficult to implement globally, since there is no centralized location for SBL hits, and we don't have any existing bots with steward permissions. – Ajraddatz (talk) 05:23, 5 March 2017 (UTC)Reply
It is a bit more difficult indeed, but I guess that loading the per-wiki spam-blacklist log for the last y hist every x minutes would be sufficient. IIRC, the bot on en.wikipedia pulls the log every 2 minutes, in that time the bot could do a full round on the ~800 wikis. It indeed would likely need steward permissions, but I am sure that there are people with a steward bit and that are bot operators, which could run the script that User:Anomie is running on en.wikipedia. --Dirk Beetstra T C (en: U, T) 10:23, 5 March 2017 (UTC)Reply
(or the logs should be made public - I don't really see a reason why these are admin-eyes-only anyway). --Dirk Beetstra T C (en: U, T) 10:24, 5 March 2017 (UTC)Reply

comune.viagrande.ct.it



This is the official website of the Italian commune of Viagrande (d:Q478954) and the blacklist block it as "spam to viagra". In es.wiki, if the infobox takes the data of Wikidata the page is permanently blocked. --Metrónomo-Goldwyn-Mayer 23:09, 29 April 2017 (UTC)Reply

I have adapted the rule, it now should exclude the link: http://comune.viagrande.ct.it. --Dirk Beetstra T C (en: U, T) 06:54, 30 April 2017 (UTC)Reply
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment.  — billinghurst sDrewth 30 April 2017 (UTC)

Problem about classic-rocks.it



I had problem in adding link above to the web site www.classic-rocks.it because, in my own opinion, it counfuse this web site with the black list one brocks.it. How to solve problem? -- Pugliak (talk) 17:39, 28 October 2017 (UTC)Reply

@Pugliak: Removed Removed  — billinghurst sDrewth 03:48, 29 October 2017 (UTC)Reply

Discussion

This section is for discussion of Spam blacklist issues among other users.

Expert maintenance

One (soon) archived and rejected removal suggestion was about jxlalk.com matched by a filter intended to block xlalk.com. One user suggested that this side-effect might be as it should be, another user suggested that regular expressions are unable to distinguish these cases, and nobody has a clue when and why xlalk.com was blocked. I suggest to find an expert maintainer for this list, and to remove all blocks older than 2010. The bots identifying abuse will restore still needed ancient blocks soon enough, hopefully without any oogle matching google cases. –Be..anyone (talk) 00:50, 20 January 2015 (UTC)Reply

No, removing some of the old rules, before 2010 or even before 2007, will result in further abuse, some of the rules are intentionally wide as to stop a wide range of spamming behaviour, and as I have argued as well, I have 2 cases on my en.wikipedia list where companies have been spamming for over 7 years, have some of their domains blacklisted, and are still actively spamming related domains. Every single removal should be considered on a case-by-case basis. --Dirk Beetstra T C (en: U, T) 03:42, 20 January 2015 (UTC)Reply
Just to give an example to this - redirect sites have been, and are, actively abused to circumvent the blacklist. Some of those were added before the arbitrary date of 2010. We are not going to remove those under the blanket of 'having been added before 2010', they will stay blacklisted. Some other domains are of similar gravity that they should never be removed. How are you, reasonably, going to filter out the rules that never should be removed. --Dirk Beetstra T C (en: U, T) 03:52, 20 January 2015 (UTC)Reply
By the way, you say ".. intended to block xlalk.com .." .. how do you know? --Dirk Beetstra T C (en: U, T) 03:46, 20 January 2015 (UTC)Reply
I know that nobody would block icrosoft.com if what they mean is microsoft.com, or vice versa. It's no shame to have no clue about regular expressions, a deficit we apparently share.:tongue:Be..anyone (talk) 06:14, 20 January 2015 (UTC)Reply
I am not sure what you are referring to - I am not native in regex, but proficient enough. The rule was added to block, at least, xlale.com and xlalu.com (if it were ONLY these two, \bxlal(u|e)\.com\b or \bxlal[ue]\.com\b would have been sufficient, but it is impossible to find this far back what all was spammed, possibly xlali.com, xlalabc.com and abcxlale.com were abused by these proxy-spammers. --Dirk Beetstra T C (en: U, T) 08:50, 20 January 2015 (UTC)Reply
xlalk.com may have been one of the cases, but one rule that was blacklisted before this blanket was imposed was 'xlale.com' (xlale.com rule was removed in a cleanout-session, after the blanket was added). --Dirk Beetstra T C (en: U, T) 04:45, 20 January 2015 (UTC)Reply
The dots in administrative domains and DNS mean something, notably foo.bar.example is typically related to an administrative bar.example domain (ignoring well-known exceptions like co.uk etc., Mozilla+SURBL have lists for this), while foobar.example has nothing to do with bar.example. –Be..anyone (talk) 06:23, 20 January 2015 (UTC)Reply
I know, but I am not sure how this relates to this suggested cleanup. --Dirk Beetstra T C (en: U, T) 08:50, 20 January 2015 (UTC)Reply
If your suggested clean-ups at some point don't match jxlalk.com the request by a Chinese user would be satisfied—as noted all I found out is a VirusTotal "clean", it could be still a spam site if it ever was a spam site.
The regexp could begin with "optionally any string ending with a dot" or similar before xlalk. There are "host name" RFCs (LDH: letter digit hyphen) up to IDNAbis (i18n domains), they might contain recipes. –Be..anyone (talk) 16:56, 20 January 2015 (UTC)Reply
What suggested cleanups? I am not suggesting any cleanup or blanket removal of old rules. --Dirk Beetstra T C (en: U, T) 03:50, 21 January 2015 (UTC)Reply
Of course I'm not sure. There is no issue of bad faith. He had reason to use regex, for two sites, and possibly suspected additional minor changes would be made. But he only cited two sites. One of the pages was deleted, and has IP evidence on it, apparently, which might lead to other evidence from other pages, including cross-wiki. But the blacklistings themselves were clearly based on enwiki spam and nothing else was mentioned. This blacklist was the enwiki blacklist at that time. After enwiki got its own blacklist, the admin who blacklisted here attempted to remove all his listings. This is really old and likely obsolete stuff. --Abd (talk) 20:07, 21 January 2015 (UTC)Reply
3 at least. And we do not have to present a full case for blacklisting (we often don't, per en:WP:BEANS and sometimes privacy concerns), we have to show sufficient abuse that needs to be stopped. And if that deleted page was mentioned, then certainly there was reason to believe that there were cross-wiki concerns.
Obsolete, how do you know? Did you go through the cross-wiki logs of what was attempted to be spammed? Do you know how often some of the people active here are still blacklisting spambots using open proxies? Please stop with these sweeping statements until you have fully searched for all evidence. 'After enwiki got its own blacklist, the admin who blacklisted here attempted to remove all his listings.' - no, that was not what happened. --Dirk Beetstra T C (en: U, T) 03:16, 22 January 2015 (UTC)Reply
Hi!
I searched all the logs (Special:Log/spamblacklist) of several wikis using the regexp entry /xlal[0-9a-z-]*\.com/.
There were almost no hits:
w:ca: 0
w:ceb: 0
w:de: 0
w:en: 1: 20131030185954, xlalliance.com
w:es: 1: 20140917232510, xlalibre.com
w:fr: 0
w:it: 0
w:ja: 0
w:nl: 0
w:no: 0
w:pl: 0
w:pt: 0
w:ru: 0
w:sv: 0
w:uk: 0
w:vi: 0
w:war: 0
w:zh: 1: 20150107083744, www.jxlalk.com
So there was just one single hit at w:en (not even in the main namespace, but in the user namespace), one in w:es, and one in w:zh (probably a false positive). So I agree with user:Abd that removing of this entry from the sbl would be the best solution. -- seth (talk) 18:47, 21 February 2015 (UTC)Reply
Finally an argument based on evidence (these logs should be public, not admin-only - can we have something like this in a search-engine, this may come in handy in some cases!). Consider removed. --Dirk Beetstra T C (en: U, T) 06:59, 22 February 2015 (UTC)Reply
By the way, Seth, this is actually no hits - all three you show here are collateral. Thanks for this evidence, this information would be useful on more occasions to make an informed decision (also, vide infra). --Dirk Beetstra T C (en: U, T) 07:25, 22 February 2015 (UTC)Reply
I am not sure that we want the Special page to be public, though I can see some value in being able to have something at ToolLabs to be available to run queries, or something available to be run through quarry.  — billinghurst sDrewth 10:57, 22 February 2015 (UTC)Reply
Why not public? There is no reason to hide this, this is not BLP or COPYVIO sensitive information in 99.99% of the hits. The chance that this is non-public information is just as big as for certain blocks to be BLP violations (and those are visible) ... --Dirk Beetstra T C (en: U, T) 04:40, 23 February 2015 (UTC)Reply

Now restarting the original debate

As the blacklist is long, and likely contains rules that are too wide a net and which are so old that they are utterly obsolete (or even, may be giving collateral damage on a regular basis), can we see whether we can set up some criteria (that can be 'bot tested'):

  1. Rule added > 5 years ago.
  2. All hits (determined on a significant number of wikis), over the last 2 years (for now: since the beginning of the log = ~1.5 years) are collateral damage - NO real hits.
  3. Site is not a redirect site (should not be removed, even if not abused), is not a known phishing/malware site (to protect others), or a true copyright violating site. (this is hard to bot-test, we may need s.o. to look over the list, take out the obvious ones).

We can make some mistakes on old rules if they are not abused (remove some that actually fail #3) - if they become a nuisance/problem again, we will see them again, and they can be speedily re-added .. thoughts? --Dirk Beetstra T C (en: U, T) 07:25, 22 February 2015 (UTC)Reply

@@Hoo man: you have worked on clean up before, some of your thoughts would be welcomed.  — billinghurst sDrewth 10:53, 22 February 2015 (UTC)Reply
Doing this kind of clean up is rather hard to automatize. What might be working better for starters could be removing rules that didn't match anything since we started logging hits. That would presumably cut down the whole blacklist considerably. After that we could re-evaluate the rest of the blacklist, maybe following the steps outlined above. - Hoo man (talk) 13:33, 22 February 2015 (UTC)Reply
Not hitting anything is dangerous .. there are likely some somewhat obscure redirect sites on it which may not have been attempted to be abused (though, also those could be re-added). But we could do test-runs easily - just save a cleaned up copy of the blacklist elsewhere, and diff them against the current list, and see what would get removed.
Man, I want this showing up in the RC-feeds, then LiWa3 could store them in the database (and follow redirects to show what people wanted to link to ..). --Dirk Beetstra T C (en: U, T) 03:30, 23 February 2015 (UTC)Reply
Hi!
I created a table of hits of blocked link additions. Maybe it's of use for the discussion: User:lustiger_seth/sbl_log_stats (1,8 MB wiki table).
I'd appreciate, if we deleted old entries. -- seth (talk) 22:12, 26 February 2015 (UTC)Reply
Hi, thank you for this, it gives a reasonable idea. Do you know if the rule-hits were all 'correct' (for those that do show that they were hit) or mainly/all false-positives (if they are false-positive hitting, we could based on this also decide to tighten the rule to avoid the false-positives). Rules with all-0 (can you include a 'total' score) would certainly be candidates for removal (though still determine first whether they are 'old' and/or are nono-sites before removal). I am also concerned that this is not including other wikifarms - some sites may be problematic on other wikifarms, or hitting a large number of smaller wikis (which have less control due to low admin numbers). --Dirk Beetstra T C (en: U, T) 03:36, 8 March 2015 (UTC)Reply
Hi!
We probably can't get information of false positives automatically. I added a 'sum' column.
Small wikis: If you give me a list of the relevant ones, I can create another list. -- seth (talk) 10:57, 8 March 2015 (UTC)Reply
Thanks for the sum-column. Regarding the false-positives, it would be nice to be able to quickly see what actually got blocked by a certain rule, I agree that that then needs a manual inspection, but the actual number of rules with zero hits on the intended stuff to be blocked is likely way bigger than what we see.
How would you define the relevant small wikis - that is depending on the link that was spammed? Probably the best is to parse all ~750 wiki's, make a list of rules with 0 hits, and a separate list of rules with <10 hits (and including there the links that were blocked), and exclude everything above that. Then these resulting rules should be filtered by those which were added >5 years ago. That narrows down the list for now, and after a check for obvious no-no links, those could almost be blanket-removed (just excluding the ones with real hits, the obvious redirect sites and others - which needs a manual check). --Dirk Beetstra T C (en: U, T) 06:59, 9 March 2015 (UTC)Reply
Hi!
At User:Lustiger_seth/sbl_log_stats/all_wikis_no_hits there's a list containing ~10k entries that never triggered the sbl during 2013-sep and 2015-feb anywhere (if my algorithm is correct).
If you want to get all entries older than 5 years, then it should be sufficent to use only the entries in that list until (and including) \bbudgetgardening\.co\.uk\b.
So we could delete ~5766 entries. What do think? Shall we give it a try? -- seth (talk) 17:06, 18 April 2015 (UTC)Reply
The question is, how many of those are still existing redirect sites etc. Checking 5800 is quite a job. On the other hand, with LiWa3/COIBot detecting - it is quite easy to re-add them. --Dirk Beetstra T C (en: U, T) 19:28, 21 April 2015 (UTC)Reply
According to the last few lines, I've removed 124kB of non-hitting entries now. I did not remove all of them, because some were url shorteners and I guess, that they are a special case, even if not used yet. -- seth (talk) 22:25, 16 September 2015 (UTC)Reply

Blacklisting spam URLs used in references

Looks like a site is using the "references" section as a spam farm. If a site is added to this list, can the blacklist block the spam site? Raysonho (talk) 17:45, 5 September 2015 (UTC)Reply

Yes they can.--AldNonymousBicara? 21:56, 5 September 2015 (UTC)Reply
Thanks, Aldnonymous! Raysonho (talk) 00:07, 6 September 2015 (UTC)Reply

url shorteners

Hi!
IMHO the url shorteners should be grouped in one section, because they are a special group of urls that need a special treatment. A url shortener should not be removed from sbl unless the domain is dead, even if it has not been used for spamming, right? -- seth (talk) 22:11, 28 September 2015 (UTC)Reply

That would be beneficial to have them in a section. Problem is, most of them are added by script, and are hence just put at the bottom. --Dirk Beetstra T C (en: U, T) 04:51, 4 October 2015 (UTC)Reply
Maybe it would seem more preferable to have "spam blacklist" be a compilation file, made of files one of which would be "spam blacklist.shorteners"  — billinghurst sDrewth 12:15, 24 December 2015 (UTC)Reply
This seems like a nice idea. Would certainly help with cleaning up of it (which we don't do nowadays). IIRC, it is technically possible to have different spam blacklist pages so this is technically possible, just needs a agreement among us and someone to do it. --Glaisher (talk) 12:17, 24 December 2015 (UTC)Reply

@Beetstra, Lustiger seth, Glaisher, Vituzzu, MarcoAurelio, Hoo man, and Legoktm: and others. What are your thoughts on a concatenation of files as described above. If we have a level of agreement, then we can work out the means to an outcome.  — billinghurst sDrewth 12:39, 25 January 2016 (UTC)Reply

  • I am somewhat in favour of this - split the list into a couple of sublists - one for url-shorteners, one for 'general terms' (mainly at the top of the list currently), and the regular list. It would however need an adaptation of the blacklist script (I've done something similar for en.wikipedia (a choice of blacklisting or revertlisting for each link), I could give that hack a try here, time permitting). AFAIK the extension in the software is capable of handling this. Also, it would be beneficial for the cleanout work, that the blacklist itself is 'sectioned' into years. Although being 8 years old is by no means a reason to expect that the spammers are not here anymore (I have two cases on en.wikipedia that are older than that), we do tend to be more lenient with the old stuff. (on the other hand .. why bother .. the benefits are mostly on our side so we don't accidentally remove stuff that should be solved by other means). --Dirk Beetstra T C (en: U, T) 13:05, 25 January 2016 (UTC)Reply
Is it really possible to have different spam blacklist pages? What would happen to the sites that use this very list to block unwanted spam? —MarcoAurelio 14:23, 25 January 2016 (UTC)Reply
It is technically possible. But this would mean that if we move all the URL shortener entries to a new page, all sites using it currently would have to update the extension or explicitly add the new blacklist to their config or these links would be allowed on their sites (and notifying all these wikis about this breaking change is next to impossible). Another issue I see is that a new blacklist file means there would be a separate network request on cache miss so their might be a little delay in page saves (but I'm not sure whether this delay would be a noticeable delay). --Glaisher (talk) 15:38, 25 January 2016 (UTC)Reply
Hi!
Before we activate such a feature, we should update some scripts that don't know anything about sbl subpages yet.
Apart from that I don't think that a sectioning into years would be of much use. One can use the (manual) log for this. A subject-oriented sectioning could be of more use, but this would also be more difficult for us. -- seth (talk) 20:49, 27 January 2016 (UTC)Reply
Another list for shorters would be a good idea. Also, a bunch of years ago Hoo wrote a script to find (and remove) expired domains. --Vituzzu (talk) 14:10, 22 February 2017 (UTC)Reply

Before developers spend a lot of time on this, I would really prefer that they spend their time to completely overhaul the whole blacklist system, and make it more edit-filter like (though not with the heavy overhead of the interpretation mechanism, just plain regexes like the blacklist, but different rules for different regexes. --Dirk Beetstra T C (en: U, T) 10:08, 23 February 2017 (UTC)Reply

External repositories

I recently found opensources.co's notCredible list. I think this, along with similar projects, could be a good addition to the list.

Also, it would be great if the updates could be automated.--Strainu (talk) 16:05, 10 February 2017 (UTC)Reply

It hasn't been our approach to proactively seek out lists of problematic sites, it has instead been reactive to problematic editing, typical examples are the redirect urls where we wait until they are spammed. To do that I would be prefer to see an RFC at meta and something notified to the wikis. Such a change needs to have a demonstrated value. Do we have an indication that we are being abused? Maybe we are better talking to @Beetstra: on whether we can utiise COIBot to simply add each domain to the bot's monitor list, at least as part of the proof of problem needing to be managed.

*If* were going to change to pro-active generated lists, I would like to see a re-architecture of how the blacklist is generated. It would be more useful to keep all such lists as their own entities and have a means to concatenate them at list generation time. I look at the indication of time that it takes for spamblacklist to roll out and anything that makes that longer and longer with little clear gain.  — billinghurst sDrewth 11:36, 11 February 2017 (UTC)Reply

To note that I have picked 10 clickbait sites and having COIBot run some checks to see what may be there.  — billinghurst sDrewth 11:40, 11 February 2017 (UTC)Reply

Mentioning the sites here, in a LinkSummary template, makes a record in COIBot that they were reported here, and a report is generated. As anything can be spammed, I see little gain in having everything being monitored, even pre-emptively. We know that some spammers will create domains as they need them, and they cannot be pre-emptively monitored. Having a stronger detection mechanism may be nice, but since the linkwatchers are having a pain following everything, I don't think I will have resources to make a stronger mechanism (nor do I have time).

What sDrewth is suggesting is of what we want already for a long, long time: A proper rewrite of the spam blacklist. Developers unfortunately are busy with what WMF thinks is more important.</frustration> What I would like to have is a spam blacklist based on the EditFilter - take the current EditFilter, strip it completely of its interpretation function, and replace it with a simple field with regexes (formatted like in our blacklist). The only test that the system needs to do is to test, for each 'SpamFilter', whether the regexes match against the added external links. You then have the options (already available from EditFilter) to log only, warn, throttle, (warn and block), and block, add a custom custom ('hej <expletive>, you are using a redirect site, please adapt your edit and use the real site'). You can then fancy it up with more options if you like (add whitelisting, wikifamily-selection, namespace-selection, per-page exclusion, etc. etc.). It would give much more flexibility and control to the current spam-blacklist, and pre-emptive monitoring through log-only (or even warning) would be a non-disruptive option. Working a lot here and on en.wikipedia, I am sometimes surprised how much 'good use' there is of some links that nonetheless need to be blacklisted due to uncontrollable abuse. --Dirk Beetstra T C (en: U, T) 03:48, 13 February 2017 (UTC)Reply

@Hoo man: is there value transferring some of this into phabricator request for spam blacklist v. extrapolation of abuse filter? I remember there was some discussion about abuse filter needing some sort of morph, so not sure how we best address this to the developer community.  — billinghurst sDrewth 03:53, 24 February 2017 (UTC)Reply
@Billinghurst: Sorry for the late reply, but yes please, bring this up on Phabricator. We have been talking about this change since at least 2013, but sadly nothing happened since (except for some discussions here and there). Cheers, Hoo man (talk) 10:40, 30 March 2017 (UTC)Reply

Make all groups non-capturing

Currently, there are almost 200 capturing groups used in the blacklist. Because these capture, the regex engine has to devote extra resources to them, and because nothing is done with the groups, this extra expenditure is pointless. These groups can be made non-capturing by adding ?: just after the opening parenthesis. Note that if any group already has a ? following the opening parenthesis, the group shouldn't be touched. (The current coding is not actively problematic to my knowledge, so this is more an ounce-of-prevention, best practices/consistency thing; there are already ~160 groups that are non-capturing in the list.) Dinoguy1000 (talk) 21:28, 3 September 2017 (UTC)Reply

@Dinoguy1000: I noticed that User:Billinghurst has adapted all those regexes. --Dirk Beetstra T C (en: U, T) 05:01, 14 September 2017 (UTC)Reply
started, and noted to where. Will complete when I have a little more time.  — billinghurst sDrewth 10:11, 14 September 2017 (UTC)Reply
Cool, I'd actually already forgotten about this request. If you're comfortable enough with regexes, I also noticed one or two groups that could be reduced to character sets (though I'd have to look through the list again to find them). There's probably other potential optimizations lurking in the list, too (though most of them are probably so minor as to not be worth worrying about from a purely optimization perspective). Dinoguy1000 (talk) 12:14, 14 September 2017 (UTC)Reply
@Dinoguy1000: You could just copy the whole page into a user-sandbox of yourself, and adapt them. A diff between the current version of the spam blacklist and your sandbox then gives us the opportunity to see what changed, and decide to copy it back into the blacklist if we don't see any problems. We do similar things with cleaning up old regexes that can go, or maintenance-type combination of multiple regexes into one. --Dirk Beetstra T C (en: U, T) 13:17, 14 September 2017 (UTC)Reply
I'm not broken up enough about it to do so at this time, I think, though I'll definitely keep the option in mind in the future. Dinoguy1000 (talk) 13:25, 15 September 2017 (UTC)Reply

Global Email blacklist

Hello, FYI: Email blacklist is available now. --Steinsplitter (talk) 13:10, 13 September 2017 (UTC)Reply

Wikidata Query and tinyurl

Tracked in Phabricator:
Task T112715

Hi,
This subject probably already have been discussed, but I can't find where. So I start it (again ?) here.
Wikidata Query SPARQL links are very longs. They use urlshortener from tinyurl, so these short urls are blocked EVERYWHERE on the WFM hosted sites.
There is a phabricator request to try to figure out how to change these short url, but I don't know how long it will take. So I write here to know if it is possible to block tinyurl only in mainspace ? Simon Villeneuve 12:34, 1 November 2017 (UTC)

@Simon Villeneuve: The tinyurl is blacklisted as it is abused, along with hundreds of other url redirect services. In the global blacklist it is a hard rule, no finesse. That said, each wiki has the ability to work around this by use of their MediaWiki:spam-whitelist which can take complex regex,though no ability to whitelist based on namespace that I can see. There are phabricator tickets about WMF having a redirect service and you would do well to contribute to those, and when the community again votes on their priority projects for 2018, there will be your opportunity to advocate further.

Personally I am surprised that there isn't a tools project to allow for query to write its own url redirects in conjunction with a closed service at toollabs: similar to how petscan saves its queries, but maybe I don't understand the enormity of the task.  — billinghurst sDrewth 08:42, 2 November 2017 (UTC)Reply

@Billinghurst:,
Thank you for the complete answer. Simon Villeneuve 15:38, 2 November 2017 (UTC)Reply

{{[[Special:MyLanguage/Template: — billinghurst sDrewth 08:08, 3 November 2017 (UTC)| — billinghurst sDrewth 08:08, 3 November 2017 (UTC)]]}}Reply

@Simon Villeneuve: Please note that I would, probably vehemently, oppose using the whitelist on WikiData to whitelist a specific url-shortener for frequently used links (or any blanket whitelisting, for that matter). These url-shorteners cannot be restricted there to specific data, and hence be used in all fields, including in those which are rightfully blacklisted globally. One could then, e.g., replace the official website of a subject on WikiData with an abusive redirect link. That link then does get displayed on (all!) local wikis (if it is transcluded in a template, which many are). That means that a possibly harmful link will be displayed globally, ánd that all local wiki pages that transclude a certain property with a globally blacklisted link will be uneditable (as the wiki software parses the 'next edit' as an edit that adds a new link, resulting in the editor (confusingly!) not being allowed to save the page). --Dirk Beetstra T C (en: U, T) 12:41, 4 November 2017 (UTC)Reply