Talk:Spam blacklist
- Proposed additions
- Please provide evidence of spamming on several wikis. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Reports can also be submitted through SBRequest.js. Please check back after submitting your report, there could be questions regarding your request.
- Proposed removals
- Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.
- Other discussion
- Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
- Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
- #wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.
- Whitelists
There is no global whitelist, so if you are seeking a whitelisting of a url at a wiki then please address such matters via use of the respectiveMediawiki talk:Spam-whitelistpage at that wiki, and you should consider the use of the template {{edit protected}} or its local equivalent to get attention to your edit.
Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.
Completed requests are marked as {{added}}/{{removed}} or {{declined}}, and are generally archived quickly. Additions and removals are logged · current log 2025/11.
- Information
- List of all projects
- Overviews
- Reports
- Wikimedia Embassy
- Project portals
- Country portals
- Tools
- Spam blacklist
- Title blacklist
- Email blacklist
- Rename blacklist
- Closure of wikis
- Interwiki map
- Requests
- Permissions
- Bot flags
- New languages
- New projects
- Username changes
- Translations
- Speedy deletions
- snippet for logging
- {{sbl-log|29663139#{{subst:anchorencode:SectionNameHere}}}}
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 3 days and sections whose most recent comment is older than 7 days.
|
Proposed additions
[edit]fastflashc
[edit]fastflashc.com
- (LinkSearch: meta | en | es | de | fr | ru | zh | simple | c | d | Wikipedias: top 25 · 50 · major wikis · sc · gs)(Search: Google | en (G) | fr (G) | de (G) | meta (G) | backlinks | → links ←)
- (Reports: Report ← track | XWiki | Local | en | find entry | blacklist hits)(DomainTools: whois | AboutUs | Malware?)
Spammed on a few wikis throughout this year, using these accounts (ducks of each other, spam-only):
User:Johnsmitw
User:Johnkaff45
User:Johnsmi1th2
All together, those three are indeffed on enwiki, commons, enwikt, and simplewiki, and some spam beyond those as well. Seercat3160 (talk) 00:02, 10 November 2025 (UTC)
I also see:
smoothhitcarts.com
- (LinkSearch: meta | en | es | de | fr | ru | zh | simple | c | d | Wikipedias: top 25 · 50 · major wikis · sc · gs)(Search: Google | en (G) | fr (G) | de (G) | meta (G) | backlinks | → links ←)
- (Reports: Report ← track | XWiki | Local | en | find entry | blacklist hits)(DomainTools: whois | AboutUs | Malware?)
Waiting for report. --Dirk Beetstra T C (en: U, T) 08:00, 10 November 2025 (UTC)
- @Seercat3160:
Added to Spam blacklist. --Ternera (talk) 14:27, 12 November 2025 (UTC)
usersporn.com
[edit]usersporn.com
- (LinkSearch: meta | en | es | de | fr | ru | zh | simple | c | d | Wikipedias: top 25 · 50 · major wikis · sc · gs)(Search: Google | en (G) | fr (G) | de (G) | meta (G) | backlinks | → links ←)
- (Reports: Report ← track | XWiki | Local | en | find entry | blacklist hits)(DomainTools: whois | AboutUs | Malware?)
Spammed on commons and the whole farm: c:Category:Sockpuppets of Pebypezire. Yann (talk) 10:34, 11 November 2025 (UTC)
- @Yann, I only see it spammed on commons. Do you know of any instance where it was also added to other projects? Ternera (talk) 14:26, 12 November 2025 (UTC)
IP addresses
[edit]- Regex requested to be blacklisted:
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
I noticed today creation of User:COIBot/Local/108.181.3.225. Someone adding this IP all over a page, and you have no clue where it goes (link doesn't seem to work from my side in this case).
This is already broadly backlisted on many wikis.
- Link is blacklisted by \b^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b on azb.wikipedia.org
- Link is blacklisted by \b^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b on bs.wikipedia.org
- Link is blacklisted by \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} on en.wikiversity.org
- Link is blacklisted by \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} on es.wikibooks.org
- Link is blacklisted by \b^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b on fa.wikivoyage.org
- Link is blacklisted by \b^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b on id.wikipedia.org
- Link is blacklisted by \b^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b on it.wikivoyage.org
- Link is blacklisted by \b^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b on su.wikipedia.org
- Link is blacklisted by \b^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b on ta.wikipedia.org
- Link is blacklisted by \b^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b on ur.wikipedia.org
- Link is blacklisted by \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} on zh.wikivoyage.org
This could be used in blacklist evasion, and anyway a large surprise where you get to. I would consider to blacklist this site-wide, and let what needs to be used being handled by (local) whitelists. Leaving it open for discussion, I guess this needs a bit more consensus than just a request/blacklist, I'm not sure whether there are things I miss which make this a bad idea. --Dirk Beetstra T C (en: U, T) 08:29, 13 November 2025 (UTC)
Spam batch
[edit]thearticle9ja.com.ng
- (LinkSearch: meta | en | es | de | fr | ru | zh | simple | c | d | Wikipedias: top 25 · 50 · major wikis · sc · gs)(Search: Google | en (G) | fr (G) | de (G) | meta (G) | backlinks | → links ←)
- (Reports: Report ← track | XWiki | Local | en | find entry | blacklist hits)(DomainTools: whois | AboutUs | Malware?)
mannyjamz.com.ng
- (LinkSearch: meta | en | es | de | fr | ru | zh | simple | c | d | Wikipedias: top 25 · 50 · major wikis · sc · gs)(Search: Google | en (G) | fr (G) | de (G) | meta (G) | backlinks | → links ←)
- (Reports: Report ← track | XWiki | Local | en | find entry | blacklist hits)(DomainTools: whois | AboutUs | Malware?)
pimpokay.com.ng
- (LinkSearch: meta | en | es | de | fr | ru | zh | simple | c | d | Wikipedias: top 25 · 50 · major wikis · sc · gs)(Search: Google | en (G) | fr (G) | de (G) | meta (G) | backlinks | → links ←)
- (Reports: Report ← track | XWiki | Local | en | find entry | blacklist hits)(DomainTools: whois | AboutUs | Malware?)
pitasongs.com.ng
- (LinkSearch: meta | en | es | de | fr | ru | zh | simple | c | d | Wikipedias: top 25 · 50 · major wikis · sc · gs)(Search: Google | en (G) | fr (G) | de (G) | meta (G) | backlinks | → links ←)
- (Reports: Report ← track | XWiki | Local | en | find entry | blacklist hits)(DomainTools: whois | AboutUs | Malware?)
hotnewceleb9ja.com.ng
- (LinkSearch: meta | en | es | de | fr | ru | zh | simple | c | d | Wikipedias: top 25 · 50 · major wikis · sc · gs)(Search: Google | en (G) | fr (G) | de (G) | meta (G) | backlinks | → links ←)
- (Reports: Report ← track | XWiki | Local | en | find entry | blacklist hits)(DomainTools: whois | AboutUs | Malware?)
jamijamz.com.ng
- (LinkSearch: meta | en | es | de | fr | ru | zh | simple | c | d | Wikipedias: top 25 · 50 · major wikis · sc · gs)(Search: Google | en (G) | fr (G) | de (G) | meta (G) | backlinks | → links ←)
- (Reports: Report ← track | XWiki | Local | en | find entry | blacklist hits)(DomainTools: whois | AboutUs | Malware?)
A spam ring on multiple GS wikis. See spamcheck.--A09|(pogovor) 09:16, 15 November 2025 (UTC)
- @A09:
Added to Spam blacklist. --A09|(pogovor) 09:16, 15 November 2025 (UTC)
theswissbay.ch
[edit]theswissbay.ch
- (LinkSearch: meta | en | es | de | fr | ru | zh | simple | c | d | Wikipedias: top 25 · 50 · major wikis · sc · gs)(Search: Google | en (G) | fr (G) | de (G) | meta (G) | backlinks | → links ←)
- (Reports: Report ← track | XWiki | Local | en | find entry | blacklist hits)(DomainTools: whois | AboutUs | Malware?)
Reported on en by user:Stockhausenfan in en:MediaWiki_talk:Spam-blacklist#theswissbay.ch with remark 'Copyright infringement'.
The front page of this resembles a pirate ship, and the site contains many PDFs of relatively recent materials. --Dirk Beetstra T C (en: U, T) 05:14, 17 November 2025 (UTC)
Proposed removals
[edit]Troubleshooting and problems
[edit]Discussion
[edit]Tooling / cleaning
[edit]False signature to avoid archiving: Dirk Beetstra T C (en: U, T) 00:00, 1 January 2026 (UTC)
In #hometown.aol.co.uk we mentioned several ideas.
- It would be nice to have a tool that shows when an entry (or at least a specific domain or page) was last triggering a blacklist entry.
- As in 2015 [1] (see also Archives/2015-01) we should delete old entries that have not triggered the SBL for x years. (x = 5?)
- It might be reasonable to move simple SBL entries, i.e. domains, (that are not locally whitelisted) to the global BED (list of blocked external domains). However, Special:BlockedExternalDomains is disabled. So it this an option now anyway?
Concerning 1.: I'm using a script for this. But for every domain it needs ~1000 db requests (one for each wiki). So I'm not sure, whether I should put that in a public web interface. -- seth (talk) 14:58, 5 October 2025 (UTC)
- Re 1. The URL for SBL hits is encoded in logging.log_params in a non-indexable way (see e.g. quarry:query/97741). To make that feasible we would need to collect hits in a user db. I have been thinking about doing this for spamcheck for quite a while.
- Re 2. We could, but I don't see why that should be a priority IMHO.
- Re 3. There is no global BlockedExternalDomains, see phab:T401524. Once this is implemented with a way to allow local whitelisting we can move stuff over. Count Count (talk) 15:28, 5 October 2025 (UTC)
- 1. Yes, using the `logging` table in the db is what I do in my script and what I also did in 2015. I'm using the replica db at toolforge. Using the replica directly, i.e. without having a separate db that contains the needed information only, searching all ~1000 wmf wikis takes about 1 or 2 minutes for a given regexp.
- 2. I mentioned reasons in the thread above. In short: performance. However, you don't need to do anything. I'd do that.
- 3. "Once it is implemented [...]": I see. So let's skip that for now.
- -- seth (talk) 17:56, 5 October 2025 (UTC)
- 1. I've backparsed the db once (and some of that data is in the offline linkwatcher database), however that is taking a lot of time, and since it are one-off runs it does not go up-to-date. Search engine would be nice per-wiki (just looking back for the last n additions, with n defaulting to 2 or 3, looking backward for a choice timeframe), and one for cross wiki (with additional limitation for 'the big 5 wikis', 'the big 18 wikis + commons and wikidata', . For the application I suggested it does not have to find all additions, just the last couple.
2. I agree with the sentiment that it does not have priority, that performance loss is minimal, and I don't feel particularly worried if I blacklist 100 domains in one go that I bring the wiki down. Cleanup is good, though, it has a couple of advantages in administration as well (the occasional 'this website was spammed 15 years ago, it has now been usurped by another company', editing speed on the lists, easier to find complex rules).
3. BED really needs the whitelist to work on it, otherwise especially a global BED is going to be a pain for local wikis. Dirk Beetstra T C (en: U, T) 06:02, 6 October 2025 (UTC)- Unfortunately it seems that BlockedExternalDomain hit log entries are not being replicated to the Toolforge replicas. The log entries are just missing there. @Ladsgroup Is that on purpose? Count Count (talk) 07:56, 6 October 2025 (UTC)
- Compare e.g. de:Special:Redirect/logid/140172685 and quarry:/query/97766 Count Count (talk) 07:59, 6 October 2025 (UTC)
- @Count Count Hi. I don't think that's on purpose and fixing it is rather easy. Would you mind creating a phabricator ticket assigning it to me with link to this comment? Thanks Amir (talk) 15:32, 6 October 2025 (UTC)
- @Ladsgroup: Done, see phab:T406562. Thanks for having a look! Count Count (talk) 10:19, 7 October 2025 (UTC)
- Thanks! Amir (talk) 10:27, 7 October 2025 (UTC)
- @Ladsgroup: Done, see phab:T406562. Thanks for having a look! Count Count (talk) 10:19, 7 October 2025 (UTC)
- @Count Count Hi. I don't think that's on purpose and fixing it is rather easy. Would you mind creating a phabricator ticket assigning it to me with link to this comment? Thanks Amir (talk) 15:32, 6 October 2025 (UTC)
- 1. I wrote a script to fetch all SBL data from all wmf wikis since 2020 and write all the data into a sqlite-db (the script needs ~7 minutes). This is not soo big (3,4M datasets in a 0,7GB db-file) and could be a) updated automatically (e.g. every day or every hour) and b) used in a little web interface to search the data. If I automatically delete all data that is older than 5 years, this might even scale. After the bug Count Count mentioned will be fixed, I could add the BED logs.
- -- seth (talk) 22:00, 7 October 2025 (UTC)
- @Lustiger seth: Very cool. With this relatively miniscule amount of data I don't think that there even is a need to delete older data at all. It would be great if you could make the data available on a public ToolsDB database. Count Count (talk) 04:28, 8 October 2025 (UTC)
- With that size I would even suggest to go back to the beginning of time. We do have some really long-term spamming cases (10+ or 15+ years). I ran into spamming of a website that related to one of those cases just last month. Having access to that (preferably through a query link in our {{LinkSummary}} (and maybe also the user templates) would be great. Dirk Beetstra T C (en: U, T) 06:35, 8 October 2025 (UTC)
- Ok, the script created a database
s51449__sbllog_pwith one single tablesbl_logcontaining all sbl log entries of all wmf projects now and is continuously updated every 5 minutes. Its size is around 1,7GB and is has 8,1M entries. The columns are: id, project (e.g. 'dewiki'), log_id (local log_id), log_timestamp, log_namespace, log_title, comment_text, log_params (just the url), actor_name. - Next step is the creation of a web interface for queries. I'll try to do that on weekend.
- -- seth (talk) 22:19, 9 October 2025 (UTC)
- Please both by user and domain so they can be linked from our {{LinkSummary}} and {{UserSummary}} templates. Dirk Beetstra T C (en: U, T) 12:36, 10 October 2025 (UTC)
- @Lustiger seth: For faster domain querying Mediawiki (and spamcheck) store and index hostnames in reverse split order (e.g.
www.google.combecomescom.google.www.. Maybe you could add such an indexed column and either keep the full URL or break it up in protocol + hostname + rest? Count Count (talk) 13:30, 10 October 2025 (UTC)- Because of RL things I haven't continued the work. I'll try to do something this weekend.
- -- seth (talk) 09:32, 18 October 2025 (UTC)
- You can test it via:
- It's very slow if date_from < 2025 or if the URL field does not contain anything with an extractable domain.
- -- seth (talk) 14:12, 20 October 2025 (UTC)
- Oh, I'll remove the debugging output later, of course. I thought it might be helpful at the moment.
- -- seth (talk) 14:14, 20 October 2025 (UTC)
- I have built it into the relevant templates here on meta and on en.wikipedia. Thanks! Dirk Beetstra T C (en: U, T) 10:05, 21 October 2025 (UTC)
- @Lustiger seth Thanks!! Looks like
domain_rev_indexis missing an index to make it fast though: quarry:query/98322 Count Count (talk) 10:35, 21 October 2025 (UTC)- I did a
CREATE INDEX idx_rev_index ON domain(domain_rev_index);now. Is it significantly faster now? - -- seth (talk) 13:28, 21 October 2025 (UTC)
- Oh yes. Just tried it on idealrentacar.ro and get results in less than a second, same for all other domains I tried it on. Count Count (talk) 13:33, 21 October 2025 (UTC)
- Ah, ok, maybe my test was to early, because I still had to wait several dozens of seconds.
- But now the results come fast for me, too.
- -- seth (talk) 14:19, 21 October 2025 (UTC)
- Should I add an index to sbl_log.actor_name, too?
- -- seth (talk) 14:54, 21 October 2025 (UTC)
- I think, that is a good idea. For spamcheck I am getting the global user id and storing that instead which survives renames and takes up a little less space but that is not really necessary IMHO. Count Count (talk) 06:44, 22 October 2025 (UTC)
- I added an index for actor_name, now.
- Global user id: I see, yes, makes sense. Let's hope that it's really not necessary. :-)
- -- seth (talk) 20:49, 22 October 2025 (UTC)
- I think, that is a good idea. For spamcheck I am getting the global user id and storing that instead which survives renames and takes up a little less space but that is not really necessary IMHO. Count Count (talk) 06:44, 22 October 2025 (UTC)
- The query should probably be for
WHERE d.domain_rev_index LIKE ?with the param being e.g.'com.chasedream.%'so we get hits forwww.chasedream.comas well. And each reversed hostname/domain should end with a '.' so that we can thus match both 'www.chasedream.com' and 'chasedream.com' but not 'chasedreamxyz.com' like this. Count Count (talk) 13:42, 21 October 2025 (UTC)- You are totally right. Should be done now.
- -- seth (talk) 14:53, 21 October 2025 (UTC)
- Works great, thank you! Count Count (talk) 06:40, 22 October 2025 (UTC)
- Oh yes. Just tried it on idealrentacar.ro and get results in less than a second, same for all other domains I tried it on. Count Count (talk) 13:33, 21 October 2025 (UTC)
- I did a
- Ok, the script created a database
- Ok, first step done. Maybe this or next weekend I'll have a look at the second step.
- -- seth (talk) 08:49, 30 October 2025 (UTC)
- Hmm, haven't started yet. This needs more time.
- Nevertheless, I found a bug in my scripts. The new database of the logs has several wrong entries, because some of the original entries in the wmf tables are really strange. I'll fix that bug first and then rebuild the tables.
- -- seth (talk) 13:13, 9 November 2025 (UTC)
