Jump to content

Talk:Spam blacklist

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by Billinghurst (talk | contribs) at 13:59, 3 January 2021 (→‎Blackhat SEO sites: Added). It may differ significantly from the current version.

Latest comment: 3 years ago by Billinghurst in topic Proposed additions
Shortcut:
WM:SPAM
WM:SBL
The associated page is used by the MediaWiki Spam Blacklist extension, and lists regular expressions which cannot be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any Meta administrator can edit the spam blacklist; either manually or with SBHandler. For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.

Proposed additions
Please provide evidence of spamming on several wikis. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Please check back after submitting your report, there could be questions regarding your request.
Proposed removals
Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.
Other discussion
Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
#wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.
Whitelists
There is no global whitelist, so if you are seeking a whitelisting of a url at a wiki then please address such matters via use of the respective Mediawiki talk:Spam-whitelist page at that wiki, and you should consider the use of the template {{edit protected}} or its local equivalent to get attention to your edit.

Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.


Completed requests are marked as {{added}}/{{removed}} or {{declined}}, and are generally archived quickly. Additions and removals are logged · current log 2024/06.

SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 7 days and sections whose most recent comment is older than 15 days.

Proposed additions

This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived.

g.co



Hello, I'm trying to understand how to manage the inclusion of pages created with Wikidata list Template using d:Property:P2671 and d:Property:P646 in the ListeriaBot Spam/abuse filter: any hint? Pietro (talk) 09:58, 30 November 2020 (UTC)Reply

@Pietro: you mean https://www.google.com/search?q=knowledge+graph+search+api&kponly&kgmid=/g/113qbrbyl as redirected from https://g.co/kg/g/113qbrbyl? My solution would be is to make a google custom search engine to feed your search term to https://www.google.com/search?q=knowledge+graph+search+api&kponly&kgmid=<your search term> (not the full redirect on g.co), and whitelist that specific custom search engine on your wiki. As it looks now, you'll have to whitelist the whole g.co/kg tree (which may not be too bad, as that is a specific corner which cannot be custom-abused). --Dirk Beetstra T C (en: U, T) 10:58, 30 November 2020 (UTC)Reply
Thanks @Beetstra: as we are talking about URLs that have been deemed worthy of listing as properties in Wikidata, I would avoid custom search engines in every Wiki, but I would rater whitelist g.co/kg as you proposed. To prevent another potential problem, how to deal with d:Property:P3749? Pietro (talk) 11:39, 30 November 2020 (UTC)Reply
@Pietro: Would excluding kg in addition to maps in \bgoo\.gl\b(?!/maps\b).* work? Is g.co/kg the same as goo.gl/kg?? --Dirk Beetstra T C (en: U, T) 11:49, 30 November 2020 (UTC)Reply
@Beetstra: I would use the formatter URL in Wikidata for each property, so g.co/kg for both d:Property:P2671 and d:Property:P646, and maps.google.com for d:Property:P3749. I'm sorry, but I cannot help for goo.gl/kg. Pietro (talk) 14:54, 30 November 2020 (UTC)Reply

I'll change the first rule, and add a second:

  • Regex requested to be blacklisted: \bgoo\.gl\b(?!/(maps
  • Regex requested to be blacklisted: \bg\.co\b(?!/(maps

--Dirk Beetstra T C (en: U, T) 05:41, 1 December 2020 (UTC)Reply

@Pietro: Added Added to Spam blacklist. --Dirk Beetstra T C (en: U, T) 05:44, 1 December 2020 (UTC)Reply
@Beetstra and Billinghurst: editing wikidata:Property:P2671#P1630 still triggers g.co blacklist error, I suppose, because regexp was refined, but original rule was not removed[1]. --Lockal (talk) 09:40, 2 December 2020 (UTC)Reply
@Lockal and Billinghurst: I tried some things, don't seem to get it fixed. @Billinghurst: can you try? --Dirk Beetstra T C (en: U, T) 12:52, 2 December 2020 (UTC)Reply
I am just confused. What are we trying to add/stop? Seems like the blacklist is blocking whatever is your issue.  — billinghurst sDrewth 00:39, 3 December 2020 (UTC)Reply
@Billinghurst: we are trying to stop everything on g.co and goo.gl (both (abused) redirect sites) except for the /maps and /kg paths on those two. --Dirk Beetstra T C (en: U, T) 05:39, 3 December 2020 (UTC)Reply
Maybe if I would have signed the previous, User:Billinghurst would actually have been pinged. --Dirk Beetstra T C (en: U, T) 05:39, 3 December 2020 (UTC)Reply
From What I Know, according to text on https://g.co/, "you can trust that it will always take you to a Google product or service". goo.gl is a google maps + arbitrary redirector, there is no need to unblock "/kg/" there. https://g.co/kg/m/010qx6n8 - works and safe, https://goo.gl/kg/m/010qx6n8 - 404 - whatever, https://goo.gl/maps/2VgGWLJSeR58JA156 - works and safe, https://g.co/maps/jxc9u - works and safe, https://goo.gl/kgbybz - arbitrary redirect - should be blocked). All links from https?://g\.co/kg\b redirect only to Machine-Readable IDs (MREIDs) and should not be unshortened, because http://g.co/kg is official Google's RDF prefix (e. g. for json-ld notation). Other g.co links are "although it is specific for google services, it is still a redirect" (as discussed at Talk:Spam_blacklist/Archives/2014-09#g.co) and may remain blocked (up to you). --Lockal (talk) 13:57, 3 December 2020 (UTC)Reply
@Beetstra and Billinghurst: still a problem. --Lockal (talk) 19:10, 22 December 2020 (UTC)Reply
@Beetstra and Billinghurst:, I see you are both active. If you have issues with building regexp, just remove (?<!-)\bg\.co\b completely, as it was added with incomplete rationale (no examples of spamming activity), efficiently working like a ban of google.com. --Lockal (talk) 11:02, 28 December 2020 (UTC)Reply

astroturfwars.com / macaupoker99a.com





Coming from Commons (see thread), there has been a spamming issue for several months of gambling site links being added to file descriptions. The two I'm bringing up here have been spammed on both Commons and EN Wikipedia by throwaway accounts, so I'm requesting global blacklisting.

macaeupoker99a:

astroturfwars:

Thanks, ~SuperHamster Talk Contribs 02:48, 20 December 2020 (UTC)Reply

@SuperHamster: Added Added to Spam blacklist. --Dirk Beetstra T C (en: U, T) 04:18, 20 December 2020 (UTC)Reply

otherwhatsapp.com



Cross-wiki spam. See also User:COIBot/LinkReports/otherwhatsapp.com --SCP-2000 07:16, 24 December 2020 (UTC)Reply

@SCP-2000: Added Added to Spam blacklist. --Dirk Beetstra T C (en: U, T) 10:30, 24 December 2020 (UTC)Reply

Spammed search link



Regex requested to be blacklisted: \bestateguideblog\.com/\?s=  — billinghurst sDrewth 14:47, 26 December 2020 (UTC)Reply

@Billinghurst: Added Added to Spam blacklist. -- — billinghurst sDrewth 14:48, 26 December 2020 (UTC)Reply

Hijacked domains

































































































































Hijacked domains used by the cross-wiki spammer Viclau (talk • contribs • deleted contribs • logs • filter log • block user • block log • GUC • CA). Mostly are on enwp and zhwp and the rest are scattered in number of other wikis. -Mys_721tx (talk) 08:17, 28 December 2020 (UTC)Reply




This one seems to have been returned to the owner. Included here only for documentation purpose. -Mys_721tx (talk) 08:17, 28 December 2020 (UTC)Reply
@Mys 721tx: Added Added to Spam blacklist. -- — billinghurst sDrewth 12:03, 28 December 2020 (UTC)Reply

Cross-wiki spamming





  • was.edu.hk
Sites directly associated with the cross-wiki spammer Viclau (talk • contribs • deleted contribs • logs • filter log • block user • block log • GUC • CA). Spammed on zh_wp, en_wp, simple_wp, zh_yue_wp. -Mys_721tx (talk) 08:17, 28 December 2020 (UTC)Reply
@Mys 721tx: Added Added to Spam blacklist. -- — billinghurst sDrewth 06:17, 29 December 2020 (UTC)Reply

nico.ms



This is a URL shortener of Niconico per Nico Nico Pedia, which contains sample links. (Evidence of spamming is not required for URL shorteners.) ネイ (talk) 11:50, 28 December 2020 (UTC)Reply
@ネイ: it is a closed/dedicated redirect for a domain. If the underlying domain is being used to spam, and this shortcut is to get around those controls then we can blacklist. If it is an unabused shortener, then we have tended to leave these dedicated shortcuts be as they are.  — billinghurst sDrewth 12:02, 28 December 2020 (UTC)Reply
For the record, it contains redirects to *.nicovideo.jp and niconicommons.jp, and appears to be dedicated (does not contain redirects outside these sites). If the current practice is to leave these as they are, I am fine for not blacklisting it; "Exceptions include malicious domains and URL redirector/shortener services" appears to be different from the current practice then. ネイ (talk) 12:10, 28 December 2020 (UTC)Reply
@ネイ:  Declined as we don't define as a shortened version of a url as a service, which is focused at those domains that offer it as a service. Utilisation is not a service. We would be killing all of blogspot TLD domains if that were these case  — billinghurst sDrewth 12:14, 28 December 2020 (UTC)Reply

Hijacked domains





Hijacked domains used by the cross-wiki spammer Viclau (talk • contribs • deleted contribs • logs • filter log • block user • block log • GUC • CA). -Mys_721tx (talk) 17:57, 28 December 2020 (UTC)Reply
@Mys 721tx: Added Added to Spam blacklist. -- — billinghurst sDrewth 06:11, 29 December 2020 (UTC)Reply

Spambot domain cluster









and will manually amend to some smaller regex when adding  — billinghurst sDrewth 02:21, 2 January 2021 (UTC)Reply

@Billinghurst: Added Added to Spam blacklist. -- — billinghurst sDrewth 02:23, 2 January 2021 (UTC)Reply

Blackhat SEO sites















Blackhat sites that are being used to create non-notable entries on small wikis and then corresponding Wikidata items. For ex. d:Q92052575. More to come.. ‐‐1997kB (talk) 05:12, 3 January 2021 (UTC)Reply
These have a long history of being used to provide references for UPE articles on en.wp. I consider using these as references to be a block on sight offence. Support blacklisting. There's a longer list at w:Wikipedia_talk:WikiProject_Spam#Publicity_farm_including_londondailypost.com_and_others and w:User:Praxidicae/fntest. MER-C (talk) 12:22, 3 January 2021 (UTC)Reply
Comment Comment I will generally also put a monitor command onto the dodgy domains that are unblocked so far. So they are searchable "monitor domain blahblah.com blackhat SEO" or something similar.  — billinghurst sDrewth 13:58, 3 January 2021 (UTC)Reply
@1997kB: Added Added to Spam blacklist. -- — billinghurst sDrewth 13:59, 3 January 2021 (UTC)Reply

Proposed additions (Bot reported)

This section is for domains which have been added to multiple wikis as observed by a bot.

These are automated reports, please check the records and the link thoroughly, it may report good links! For some more info, see Spam blacklist/Help#COIBot_reports. Reports will automatically be archived by the bot when they get stale (less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot).

Sysops
  • If the report contains links to less than 5 wikis, then only add it when it is really spam
  • Otherwise just revert the link-additions, and close the report; closed reports will be reopened when spamming continues
  • To close a report, change the LinkStatus template to closed ({{LinkStatus|closed}})
  • Please place any notes in the discussion section below the HTML comment

COIBot

The LinkWatchers report domains meeting the following criteria:

  • When a user mainly adds this link, and the link has not been used too much, and this user adds the link to more than 2 wikis
  • When a user mainly adds links on one server, and links on the server have not been used too much, and this user adds the links to more than 2 wikis
  • If ALL links are added by IPs, and the link is added to more than 1 wiki
  • If a small range of IPs have a preference for this link (but it may also have been added by other users), and the link is added to more than 1 wiki.
COIBot's currently open XWiki reports
List Last update By Site IP R Last user Last link addition User Link User - Link User - Link - Wikis Link - Wikis
vrsystems.ru 2023-06-27 15:51:16 COIBot 195.24.68.17 192.36.57.94
193.46.56.178
194.71.126.227
93.99.104.93
2070-01-01 05:00:00 4 4

Proposed removals

This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.

Remember to provide the specific domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as {{removed}} or {{declined}} and archived.

See also recurring requests for repeatedly proposed (and refused) removals.

Notes:

  • The addition or removal of a domain from the blacklist is not a vote; please do not bold the first words in statements.
  • This page is for the removal of domains from the global blacklist, not for removal of domains from the blacklists of individual wikis. For those requests please take your discussion to the pertinent wiki, where such requests would be made at Mediawiki talk:Spam-blacklist at that wiki. Search spamlists — remember to enter any relevant language code

casino.ru



As it was mentioned in Talk:Spam blacklist this domain was blacklisted because: ‘Spammed by numerous IPs on Russian and Ukrainian Wikipedias. --Mercy 15:33, 23 December 2009 (UTC)’. I found this site useful for gambling articles. There are original interviews, articles and news. F.e. Gambling in Ukraine – the article in english is poor and need a lot of work and references from native speaker’s sites will be good. https://en.wikipedia.org/wiki/Craps https://en.wikipedia.org/wiki/Casino_token https://en.wikipedia.org/wiki/Gambling_in_Ukraine https://en.wikipedia.org/wiki/Gambling_in_Macau https://en.wikipedia.org/wiki/Gambling_age https://en.wikipedia.org/wiki/Gambling https://en.wikipedia.org/wiki/Gaming_law

Sort of an EP per User talk:SmurFF2020. Camouflaged Mirage (talk) 12:04, 28 December 2020 (UTC)Reply
Comment Comment Keen to defer to en for local whitelisting though, as globally there is still some undesired impact, @Ohnoitsjamie: is it possible to whitelist locally, yes, it cannot be removed from en blacklist as it's on meta but there is a possibility of using local whitelist. I am uncomfortable to remove it globally per just some pages in one wiki needing it as it's clearly spammy (and true on ru/uk wp) Camouflaged Mirage (talk) 12:07, 28 December 2020 (UTC)Reply
@SmurFF2020: Best that you ask at w:mediawiki talk:spam-whitelist and ask for local whitelisting. Asking there will create a local record and enable a local conversation.  — billinghurst sDrewth 12:10, 28 December 2020 (UTC)Reply
I see no reason to remove it globally. The user who requested removing it from the en blacklist has no edits there; we rarely whitelist for new users, as it usually suggest the strong possibility of a WP:COI. Ohnoitsjamie (talk) 19:14, 28 December 2020 (UTC)Reply

1000fragen.de



imho the blacklisting was done in error, because there is no indication for spamming.
1000fragen.de is a redirect now. but it wasn't a redirect some time ago. as the domain is blacklisted, this now leads to problems, if somebody wants to add an archived url.

so i'm going to unblacklist that domain, now. -- seth (talk) 16:21, 29 December 2020 (UTC)Reply

@Lustiger seth: I disagree with the action, as it was spammed, and will now only be spam anywhere else. That will not have been the only instance at that time. I would suggest that you whitelist at your wiki instead.  — billinghurst sDrewth 21:27, 29 December 2020 (UTC)Reply
hi!
where is/was the spam? at User:COIBot/XWiki/1000fragen.de i only see only link addition.
if there is (at least) a moderate risk of further spamming, then we can go that way of whitelisting the domain at dewiki. but what are the indications for spamming in this case? -- seth (talk) 22:58, 29 December 2020 (UTC)Reply
It was +++ months ago, so having to remember specifics one from many won't be happening. It is just not my practice to blacklist based on one edit, unless a site it is truly problematic. COIBot never shows them all, unless it meets the formula, and global-search becomes the check. I have set the domain to monitor and will come back if we are having problems.  — billinghurst sDrewth 01:26, 30 December 2020 (UTC)Reply
ok, thanks. let's wait and see. (next time it would be great if you could be a bit more precise when writing the reason for blacklisting at the corresponding XWiki page.) -- seth (talk) 09:39, 30 December 2020 (UTC)Reply

goodtherapy.org



This blacklisting is quite unfortunate for Wikipedia. I have started creating articles in psychology for the French Wikipédia, and this site would be immensely useful. I would, for example, need to use references from goodtherapy for articles about Murray Bowen, Virginia Axline and Family Systems Therapy. Their articles appear clear, concise and thorough, which are invaluable advantages for our work on Wikipedia. Thank you for considering my humble request. --Liberlogos (talk) 20:16, 2 January 2021 (UTC)Reply

@Liberlogos: It was blacklisted years ago at the request of the Wikipedias. I suggest that you talk to the admins at both sites to see if they will whitelist it. frwp: Defer to w:fr:Mediawiki talk:spam-whitelist and enwp: Defer to w:en:Mediawiki talk:spam-whitelist We are unlikely to remove without other positive comment.  — billinghurst sDrewth 13:56, 3 January 2021 (UTC)Reply

Discussion

This section is for discussion of Spam blacklist issues among other users.

New script

Since I recently came across a situation where I had to track down which line in a spam blacklist caused a hit, and since it was fairly tedious even with mediawiki.org's short local blacklist, I wrote a script to automate checking all of the lines of the blacklist to find a match: User:DannyS712/FindBlacklistEntry.js. I'll probably move it to meta and write documentation at some point, but I hope this is helpful to anyone else that deals with unexpected hits - it checks both the local and global blacklists. Let me know if there are any questions. Thanks, --DannyS712 (talk) 19:31, 16 December 2020 (UTC)Reply

Thanks for letting us know. I shall take a look at it :-) —MarcoAurelio (talk) 16:33, 17 December 2020 (UTC)Reply
Documentation now at User:DannyS712/FindBlacklistEntry DannyS712 (talk) 03:00, 18 December 2020 (UTC)Reply
hi!
i guess, you could have also just used https://searchsbl.toolforge.org/ to find the corresponding entry.
however, it might be nice to have a more internal tool (such as yours). -- seth (talk) 16:30, 29 December 2020 (UTC) 22:59, 29 December 2020 (UTC)Reply
👍👍👍 It is my go-to tool Lustiger seth. Coupled with COI's IRC "findrules" and "wherelisted".  — billinghurst sDrewth 21:21, 29 December 2020 (UTC)Reply