Jump to content

Talk:Spam blacklist: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Latest comment: 5 years ago by Billinghurst in topic Proposed additions
Content deleted Content added
Line 31: Line 31:
{{LinkSummary|vietvoters.org}}
{{LinkSummary|vietvoters.org}}
{{LinkSummary|showartcenter.com}}
{{LinkSummary|showartcenter.com}}
{{LinkSummary|tomoreinformation.com}}
{{LinkSummary|11rxonline.com}}
;previously blacklisted
{{LinkSummary|gotomoreinfo.com}}
{{LinkSummary|gotomoreinfo.com}}
{{LinkSummary|haicuneo.com}}
{{LinkSummary|haicuneo.com}}
{{LinkSummary|alicemchard.com}}
{{LinkSummary|alicemchard.com}}
{{LinkSummary|cheapestpricesale.info}}
{{LinkSummary|cheapestpricesale.info}}
{{LinkSummary|tillerrakes.com}}
{{LinkSummary|tillerrakes.com}} <- using for primary tracking
{{LinkSummary|tomoreinformation.com}}
{{LinkSummary|11rxonline.com}}
More incoming. &mdash;[[User:MarcoAurelio|MarcoAurelio]] ([[User talk:MarcoAurelio|talk]]) 14:02, 7 February 2019 (UTC)
More incoming. &mdash;[[User:MarcoAurelio|MarcoAurelio]] ([[User talk:MarcoAurelio|talk]]) 14:02, 7 February 2019 (UTC)
: Being handled via individual reports by [[User:Billinghurst|Billinghurst]]. &mdash;[[User:MarcoAurelio|MarcoAurelio]] ([[User talk:MarcoAurelio|talk]]) 18:21, 7 February 2019 (UTC)
: Being handled via individual reports by [[User:Billinghurst|Billinghurst]]. &mdash;[[User:MarcoAurelio|MarcoAurelio]] ([[User talk:MarcoAurelio|talk]]) 18:21, 7 February 2019 (UTC)
::Handled for a number identified by {{noping|Bsadowski}}. Note from those that I did a quick check, they were hosted on two different IP addresses, the link back function should be used to identify others, and I will be manually pushing further trackers into the last report as a means to assist. The urls had some elements of repetition, which could be investigated for a regex, either through the blacklist or as a spam filter. &nbsp;— [[user:billinghurst|billinghurst]] ''<span style="font-size:smaller">[[user talk:billinghurst|sDrewth]]</span>'' 21:20, 7 February 2019 (UTC)


== Proposed additions (Bot reported) ==
== Proposed additions (Bot reported) ==

Revision as of 21:20, 7 February 2019

Shortcut:
WM:SPAM
WM:SBL
The associated page is used by the MediaWiki Spam Blacklist extension, and lists regular expressions which cannot be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any Meta administrator can edit the spam blacklist; either manually or with SBHandler. For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.

Proposed additions
Please provide evidence of spamming on several wikis. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Please check back after submitting your report, there could be questions regarding your request.
Proposed removals
Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.
Other discussion
Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
#wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.
Whitelists
There is no global whitelist, so if you are seeking a whitelisting of a url at a wiki then please address such matters via use of the respective Mediawiki talk:Spam-whitelist page at that wiki, and you should consider the use of the template {{edit protected}} or its local equivalent to get attention to your edit.

Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.


Completed requests are marked as {{added}}/{{removed}} or {{declined}}, and are generally archived quickly. Additions and removals are logged · current log 2024/07.

Proposed additions

This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived.

RFD: outlets.co... regex for consideration

Regex requested to be blacklisted: outlets\.com?\b

I am proposing the above regex as we are seeing a lot of spambots adding shopping spam ending in ...outlets.com and ...outlets.co.uk

This is being added here for consultation as it is undetermined how many good urls could be like this. My exploration hasn't found any, though the tools for such explorations are pretty ordinary, and I am no expert on legitimate marketing urls in use at the WPs. Community input is desired prior to proceeding to false positives / negative consequences.  — billinghurst sDrewth 23:19, 15 December 2018 (UTC)Reply

@Billinghurst: That is going to be heavy on the db to run such a query. I could envisage good links here, but we'll have to exclude them by design here / whitelisting where needed. --Dirk Beetstra T C (en: U, T) 12:45, 16 December 2018 (UTC)Reply
We can just continue to blacklist if it is problematic, they usually butt up against filters, though there is a small amount of leakage.  — billinghurst sDrewth 12:51, 16 December 2018 (UTC)Reply

kubadownload.com



Cross-wiki spam campaign. Tgeorgescu (talk) 23:04, 6 February 2019 (UTC)Reply

@Tgeorgescu: Added Added to Spam blacklist. definite xwiki abuse; adding now, will have to run COIBot reports later. @Beetstra: pretty widespread and COIBot has not noticed for edits in a /64 in this case they look to be in 2a02:a311:8264:c300:0:0:0:0/64 -- — billinghurst sDrewth 23:22, 6 February 2019 (UTC)Reply
As a note to anyone who appears from xwiki to comment, it would appear that edits to software version number is correct, it is the linking to their website that is problematic. [All those templates would be so better managed through extracting wikidata.]  — billinghurst sDrewth 23:25, 6 February 2019 (UTC)Reply
@Billinghurst: those bloody IPv6 IPs .. COIBot (well, LiWa3 actually) does not understand ranges on those. I should see if in some near future I have time for that. --Dirk Beetstra T C (en: U, T) 08:21, 7 February 2019 (UTC)Reply

simplewiki drug spambots 20190207













previously blacklisted










<- using for primary tracking

More incoming. —MarcoAurelio (talk) 14:02, 7 February 2019 (UTC)Reply

Being handled via individual reports by Billinghurst. —MarcoAurelio (talk) 18:21, 7 February 2019 (UTC)Reply
Handled for a number identified by Bsadowski. Note from those that I did a quick check, they were hosted on two different IP addresses, the link back function should be used to identify others, and I will be manually pushing further trackers into the last report as a means to assist. The urls had some elements of repetition, which could be investigated for a regex, either through the blacklist or as a spam filter.  — billinghurst sDrewth 21:20, 7 February 2019 (UTC)Reply

Proposed additions (Bot reported)

This section is for domains which have been added to multiple wikis as observed by a bot.

These are automated reports, please check the records and the link thoroughly, it may report good links! For some more info, see Spam blacklist/Help#COIBot_reports. Reports will automatically be archived by the bot when they get stale (less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot).

Sysops
  • If the report contains links to less than 5 wikis, then only add it when it is really spam
  • Otherwise just revert the link-additions, and close the report; closed reports will be reopened when spamming continues
  • To close a report, change the LinkStatus template to closed ({{LinkStatus|closed}})
  • Please place any notes in the discussion section below the HTML comment

COIBot

The LinkWatchers report domains meeting the following criteria:

  • When a user mainly adds this link, and the link has not been used too much, and this user adds the link to more than 2 wikis
  • When a user mainly adds links on one server, and links on the server have not been used too much, and this user adds the links to more than 2 wikis
  • If ALL links are added by IPs, and the link is added to more than 1 wiki
  • If a small range of IPs have a preference for this link (but it may also have been added by other users), and the link is added to more than 1 wiki.
COIBot's currently open XWiki reports
List Last update By Site IP R Last user Last link addition User Link User - Link User - Link - Wikis Link - Wikis
vrsystems.ru 2023-06-27 15:51:16 COIBot 195.24.68.17 192.36.57.94
193.46.56.178
194.71.126.227
93.99.104.93
2070-01-01 05:00:00 4 4

Proposed removals

This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.

Remember to provide the specific domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as {{removed}} or {{declined}} and archived.

See also recurring requests for repeatedly propo sed (and refused) removals.

Notes:

  • The addition or removal of a domain from the blacklist is not a vote; please do not bold the first words in statements.
  • This page is for the removal of domains from the global blacklist, not for removal of domains from the blacklists of individual wikis. For those requests please take your discussion to the pertinent wiki, where such requests would be made at Mediawiki talk:Spam-blacklist at that wiki. Search spamlists — remember to enter any relevant language code

genetherapynet.com



I tried adding this and found it was blacklisted here. It seems that someone with a COI to that site and others was spamming it on articles cross wiki. As far as I can tell the editor has not been here for quite a long time (see here, this and Talk:Spam blacklist/Archives/2011-02). Was wondering if it could be removed. Not sure about the other websites that were blacklisted at the same time as I have not explored them. Aircorn (talk) 06:28, 19 July 2018 (UTC)Reply

@Aircorn: it was being spammed in 2011. A removal needs more than "I want to add it", it usually needs firm reasoning about why it is usable at the sites. You can always ask about whitelisting at w:en:mediawiki talk:spam-whitelist  — billinghurst sDrewth 09:32, 9 August 2018 (UTC)Reply
It is quite hard to find sites that explain genetic engineering and are user friendly. It would make my editing life a little bit easier if I could use it as a reference, but it is not a site I would regularly use as there are much better (although less accessable) resources out there. I figured that since it was (as far as I could tell) only blacklisted due to spamming a long time ago then it would be relatively easy to unblacklist once that was no longer an issue. I will look at whitelisting if I really need it. Thanks for the response. Aircorn (talk) 09:44, 9 August 2018 (UTC)Reply

more than 3000 entries

related: Talk:Spam_blacklist/Archives/2015-01#Now_restarting_the_original_debate

hi! (ping billinghurst, Beetstra) At user:lustiger seth/sbl log stats/all wikis no hits I started again to make a list of all sbl entries that have 0 hits in all ~900 wikis since they were added to list (but not earlier than 2013, when the sbl log came into existence). The script takes some time (another week probably). Half of the sbl entries (~4800) are checked already. more than 3000 never have been the reason for a blocked edit.
What do you think? Shall we delete those entries (except from the url shorteners) from the list? Advantage: the lesser entries, the clearer the structure. -- seth (talk) 10:28, 10 November 2018 (UTC)Reply

@Lustiger seth: Thank you for this effort. I think most of the non-shortener domains can be removed.
Re 'clearer [the] structure': Would it be possible to 'sort' the list at some point, stuff all the shorteners until now into a 'section', with at the end a mark from 'below here unsorted'. In that case, every year or so we can sort the unsorted into the above 'groups', and it would make clean-up of non-shorteners easier (you can even take them out before your parsing, no need to check whether they were abused or not, we keep 'm anyway if they are still a shortening service). --Dirk Beetstra T C (en: U, T) 07:36, 11 November 2018 (UTC)Reply
@Beetstra: I agree that merging/grouping the shorteners would be reasonable. -- seth (talk) 08:06, 11 November 2018 (UTC)Reply
  • Comment Comment I am interpreting that as you have been running through all wikis Special:log/spamblacklist. On checking some randoms, I see that some listings have come from a Poke, eg. User:COIBot/LinkReports/onlinetripadvisorturkey.com so there has been concerns that have caused addition. Have we done a cross reference against generated XWiki reports as numbers of urls come about from AbuseFilter hits, so if we have additions to blacklist, and generated XWiki reports, I am not certain that we want those removed. Also if we have a regex in place, I am loath to remove those as they have been specifically added from an evidence-base.  — billinghurst sDrewth 22:08, 11 November 2018 (UTC)Reply
    I am also seeing numbers of essay writing domains in the list, and while they have not been spammed, I am not certain that I want them removed. Call me paranoid, or call me nasty! If we are to remove urls, maybe we want to eyeball cull-proposed removals and remove those we would like to keep.  — billinghurst sDrewth 03:47, 12 November 2018 (UTC)Reply
    @Billinghurst: We could easily cut the regexes which have been added in the last # years (2?) from that list. If seth would re-run the script in a year (e.g.) then those with still no hits would come.
    Alternatively, we run a script on those filters, extract all the domains in those ... (heck, I could teach LiWa to read certain filters as prescribed in the settings and extract domains from that ... but that would be a feature that at the earliest I could write next summer; moreover I would love to have LiWa to have access to all special:log/spamblacklist, so I could record attempted additions there as well - attempts to spam would be a welcome addition to the coibot reports ...). --Dirk Beetstra T C (en: U, T) 05:17, 12 November 2018 (UTC)Reply
    (barging in) Maybe you are interested in checking against potential positives of my lists before removing? All of their entries are not collected automatically but are handpicked: spam links on Commons, spamming wikis. Best, --Achim (talk) 14:13, 12 November 2018 (UTC)Reply
    @Achim55: Your list can be loaded into COIBot; if you use IRC, we can give you permissions with COIBot to add these to be monitored per Small Wiki Monitoring Team/IRC or if not, we can give you permission so you can add them to User:COIBot/Poke. @Beetstra: are you thinking of converting to json, or something similar? If not, then I am going to need to get js/css permissions :-/  — billinghurst sDrewth 03:37, 16 November 2018 (UTC)Reply
    @Billinghurst and Achim55: I cannot just convert to json, it is currently not valid json. I will have to go to regular pages and e.g. get template-editor access for COIBot. But that is besides the point. We can also poke that list, and I will give access to achim to poke as well. —Dirk Beetstra T C (en: U, T) 04:24, 16 November 2018 (UTC)Reply
  • Comment Comment I would think that there would be value in at least keeping the old list of removed domains somewhere and having COIBot use that list at least for "monitor", or proactively pushing those in to be monitored.  — billinghurst sDrewth 03:39, 16 November 2018 (UTC)Reply

reading the blacklists ..

@Lustiger seth: related to your work here .. how do you manage to read ALL spam-blacklist-logs? I thought they were admin only .. ?? If they can be (bot-)read that would be very welcome, I could then build in the capability into LiWa3/COIBot, so the attempts to circumvent the blacklist can be shown in the reports which is very welcome evidence in case of de-blacklisting-requests, as well as for sock-hunting and finding spammers implementing workarounds (spammers attempt one blacklisted domain, and other that are not yet blacklisted .. that is a one-strike-and-you-are-out situation suitable for immediate blacklisting/blocks of the other domains). --Dirk Beetstra T C (en: U, T) 05:26, 14 November 2018 (UTC)Reply

@Beetstra: they used to be, apparently the developers decided they should be open to all logged in users in phab:T64781. — xaosflux Talk 15:49, 14 November 2018 (UTC)Reply
@Xaosflux: I I never understood the initial choice... thanks, I will need to code this into my bots! Thanks! —Dirk Beetstra T C (en: U, T) 17:04, 14 November 2018 (UTC)Reply
Hi!
another related ticket phab:T184483.
I use the replica db at toolserver. -- seth (talk) 21:28, 14 November 2018 (UTC)Reply

thewayoftheninja.org



Wanted to add this to N's page on Wikipedia after finding a fake link listed, but found this link was blocked. This is the official site for the N series of games. -- TheV360 (talk) 22:05, 20 November 2018 (UTC)Reply

@TheV360: The domain is  not globally blacklisted, it is locally blacklisted at enWP only. You will need to address this at w:en:Mediawiki talk:spam-blacklist.  — billinghurst sDrewth 22:18, 20 November 2018 (UTC)Reply

babepedia.com



I wanted to update the Playboy Playmates of the Year page (https://en.wikipedia.org/wiki/List_of_Playboy_Playmates_of_the_Year) and noticed the external link was linking to a site that doesn't exist anymore and is redirecting to the official site which hasn't been updated in a year or so. Same issue for the Playmate Of The Month and the Penthouse Pets. When trying to update to my source to babepedia.com (the only site that is regularly updating its listing it seems), I noticed babepedia.com is on the blacklist. Hoping it can be removed, as it's a reputable source that is not overloaded with annoying ads like many adult sites out there.

 Declined and please refer to the listing request as it was being xwiki spammed. You would be best to enquire at English Wikipedia about them setting up a temporary exclusion for the required edits via w:en:Mediawiki talk:Spam-whitelist.  — billinghurst sDrewth 21:03, 6 February 2019 (UTC)Reply

Troubleshooting and problems

This section is for comments related to problems with the blacklist (such as incorrect syntax or entries not being blocked), or problems saving a page because of a blacklisted link. This is not the section to request that an entry be unlisted (see Proposed removals above).

Discussion

This section is for discussion of Spam blacklist issues among other users.

Réclamation pour mon article

Je souhaite que lien posant problème soit retirer. Vu qu'il est vérifiable, vous donc aussi vérifier. Merci Bob jeci (talk) 10:53, 17 November 2018 (UTC)Reply

@Bob jeci: Hi, you are going to need to provide specific information. We cannot guess where you are editing, and the domain which you are wishing to add.  — billinghurst sDrewth 11:21, 17 November 2018 (UTC)Reply

Informations

Juste préciser que les seul informations vérifiable sur moi que je voulais ajouter je les copié à partir d'une recherche faite sur Google. Vous rentrez ( N'guessan Enoh Jean cinel) et vous aurez les liens que j'ai proposé. Merxi Bob jeci (talk) 08:15, 23 November 2018 (UTC)Reply

@Bob jeci: It is unclear what you wish to achieve. If you are saying that a search link for Google is blocked, yes, that is the purpose of the blacklisting, and won't be changed. I suggest that you discuss that matter at the wiki where you are looking to add your information.  — billinghurst sDrewth 09:45, 23 November 2018 (UTC)Reply

Smart template for deferring to local wikis?

Can we have a template, e.g. {{Deferlocal|w|en}} that results in a text 'Defer to English Wikipedia Spam blacklist' (but displaying the target in the local language etc.?) --Dirk Beetstra T C (en: U, T) 11:48, 21 November 2018 (UTC)Reply

@Beetstra: done. Defaults to enWP where parameterless. $1= sister, $2 = lang; $3 = override text link  — billinghurst sDrewth 09:56, 23 November 2018 (UTC)Reply
Comment Comment someone wish to turn it into a translated template. <shrug>  — billinghurst sDrewth 09:58, 23 November 2018 (UTC)Reply

COIBot and the spam blacklist log

COIBot is currently, in the 'free time' of the report saving module, backparsing the spam blacklist log, one wiki at a time. It turns out that one wiki is a humongous chunk of data, and that the bot spends quite some time before starting to parse reports again. Please be patient while this operation runs. The data is stored with the regular link additions, and the bots will then accessit in the same way as usual.

That likely results in certain parts of COIBot's reporting functions (on wiki and on IRC) to show strange results as some code may not understand how things are stored. I will resolve that later. --Dirk Beetstra T C (en: U, T) 17:53, 1 December 2018 (UTC)Reply

@Beetstra: Are there things that we should not do as they may hinder the process; or things that we should moderate/lessen in doing?  — billinghurst sDrewth 23:48, 1 December 2018 (UTC)Reply
Just be patient with it .. —Dirk Beetstra T C (en: U, T) 00:07, 2 December 2018 (UTC)Reply
@Beetstra: FYI: note that COIBot is writing to the wiki where quickcreate is requested, however, it is not recording its standard analysis from "report xwiki ..." They pass through in time, and are not written up at this point of time.  — billinghurst sDrewth 12:55, 16 December 2018 (UTC)Reply
@Billinghurst: I will have a look this evening. COIBot is running 2 LinkSavers, one parsing blacklists, the other one not. Unfortunately, that is prone to crashes. I presume that currently both are on a blacklist parsing the whole thing. I just hope that the one parsing en.wikipedia is done soon, but there are hellish months in the history of that (spambots hitting thousands of times an hour, back in 2015, see e.g. https://en.wikipedia.org/w/index.php?title=Special:Log/spamblacklist/91.200.12.79&action=edit&redlink=1). --Dirk Beetstra T C (en: U, T) 13:25, 16 December 2018 (UTC)Reply
@Billinghurst: bot was confused .. I restarted the LinkSaver that should be saving. It borked (nothing you can solve from IRC .. unfortunately). Just to illustrate, the blacklist parser spent the last 13 1/2 hours parsing the 2nd of May 2015 ... --Dirk Beetstra T C (en: U, T) 17:31, 16 December 2018 (UTC)Reply