Jump to content

Talk:Spam blacklist

From Meta, a Wikimedia project coordination wiki
(Redirected from WM:SPAM)
Latest comment: just now by Beetstra in topic Troubleshooting and problems
Shortcut:
WM:SPAM
WM:SBL
The associated page is used by the MediaWiki Spam Blacklist extension, and lists regular expressions which cannot be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any Meta administrator can edit the spam blacklist; either manually or with SBHandler. For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.

Proposed additions
Please provide evidence of spamming on several wikis. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Reports can also be submitted through SBRequest.js. Please check back after submitting your report, there could be questions regarding your request.
Proposed removals
Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.
Other discussion
Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
#wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.
Whitelists
There is no global whitelist, so if you are seeking a whitelisting of a url at a wiki then please address such matters via use of the respective Mediawiki talk:Spam-whitelist page at that wiki, and you should consider the use of the template {{edit protected}} or its local equivalent to get attention to your edit.

Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.


Completed requests are marked as {{added}}/{{removed}} or {{declined}}, and are generally archived quickly. Additions and removals are logged · current log 2025/12.

SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 3 days and sections whose most recent comment is older than 7 days.

Proposed additions

[edit]

magikflame.com

[edit]


Cross-wiki spam. Harry Mitchell (talk) 21:27, 10 December 2025 (UTC)Reply

@HJ Mitchell: Added Added to Spam blacklist. --NguoiDungKhongDinhDanh 23:38, 10 December 2025 (UTC)Reply

edwriting.com

[edit]


Cross-wiki spam. Not so many additions so I'd understand if this was declined but it's been spammed from the same range as magikflame.com. Harry Mitchell (talk) 21:43, 10 December 2025 (UTC)Reply

@HJ Mitchell: Added Added to Spam blacklist. --NguoiDungKhongDinhDanh 23:38, 10 December 2025 (UTC)Reply

trips2deal.com

[edit]


See Leadsbrain and Williamjeremy. NguoiDungKhongDinhDanh 23:04, 10 December 2025 (UTC)Reply

@NguoiDungKhongDinhDanh: Added Added to Spam blacklist. --NguoiDungKhongDinhDanh 23:05, 10 December 2025 (UTC)Reply

Bestie Quizzes

[edit]






Among other accounts and IPs, here are a few of the accounts:









Cross-wiki spam. These links have been spammed on en.WP and Wikidata. Apparition11 (talk) 20:04, 12 December 2025 (UTC)Reply

@Apparition11: Added Added to Spam blacklist. --NguoiDungKhongDinhDanh 20:12, 12 December 2025 (UTC)Reply

THE METRO

[edit]


malvertising Eihel (talk) 08:46, 13 December 2025 (UTC)Reply

Categorized as malvertising, I was able to verify that this site does indeed redirect to a dubious cryptocurrency site. The information on the first site is of poor quality. I saw these links on yoWP (diff) and then enWP (link, diff). This link should be absolutely avoided in primary namespaces, and indeed on entire websites! Cordially. —Eihel (talk) 08:52, 13 December 2025 (UTC)Reply

netsuite.com

[edit]


This is a fun one. The company is notable and has articles on several WPs so some whitelisting is going to be necessary, but it has been extensively spammed, mainly on enwiki, usually by IPs (often proxies) and temp accounts that make no other edits. I've just spent an hour cleaning it up and it goes back at least to 2023. Harry Mitchell (talk) 18:10, 13 December 2025 (UTC)Reply

@HJ Mitchell: I see a few more. You might want to handle those as well. NguoiDungKhongDinhDanh 18:16, 13 December 2025 (UTC)Reply
@NguoiDungKhongDinhDanh That was a bigger job than I expected and I don't have time to remove all of them. Some are legit and would need whitelisting, some I'm not sure about. Harry Mitchell (talk) 18:53, 13 December 2025 (UTC)Reply

timentask.com

[edit]


Cross-wiki spam. Harry Mitchell (talk) 20:23, 13 December 2025 (UTC)Reply

@HJ Mitchell: Added Added to Spam blacklist. --NguoiDungKhongDinhDanh 20:52, 13 December 2025 (UTC)Reply

controlhub.com

[edit]


Mostly enwiki but also spammed on eswiki and and frwiki (possibly accidentally through tanslation). Harry Mitchell (talk) 20:49, 13 December 2025 (UTC)Reply

@HJ Mitchell: Added Added to Spam blacklist. --NguoiDungKhongDinhDanh 20:53, 13 December 2025 (UTC)Reply

worldradiohistory.com

[edit]


Was deferred here. Site hosts digital scans of magazines, most of which copyright violations that shouldn't be linked per en:WP:COPYLINK (they got burned by this at least once). Public domain scans can be transferred to Commons. Mach61 (talk) 23:16, 13 December 2025 (UTC)Reply

For the record, there are at least 10000 usages of this domain across more than 90 wikis. NguoiDungKhongDinhDanh 23:22, 13 December 2025 (UTC)Reply

nolcardbalanceae.com

[edit]


Deferred here from en.wp – this is not my home wiki and I'm horribly out of my depth here, please be kind.

My original report: One of a couple of unofficial (and possibly phishing) sites related to Dubai's passenger travel card – branded "nol" – that get added to various Dubai transport articles. Bringing this one here as it has been repeatedly added and re-added, each time overwriting more useful references. • a frantic turtle 🐢 13:13, 15 December 2025 (UTC)Reply

Added Added to Spam blacklist. --– DreamRimmer 15:09, 15 December 2025 (UTC)Reply

beyondintranet.com

[edit]


Cross-wiki spam going back to July across three wikis. Harry Mitchell (talk) 13:27, 15 December 2025 (UTC)Reply

@HJ Mitchell: Added Added to Spam blacklist. --– DreamRimmer 15:11, 15 December 2025 (UTC)Reply

litcommerce.com

[edit]


Cross-wiki spam. Harry Mitchell (talk) 13:26, 16 December 2025 (UTC)Reply

@HJ Mitchell: Added Added to Spam blacklist. --NguoiDungKhongDinhDanh 13:36, 16 December 2025 (UTC)Reply

techcombank.com

[edit]










(COI, not spam)



Spammed by proxies and other interesting people. Already blocked in viwiki for spam. Children Will Listen (🐄 talk, 🫘 contribs) 03:22, 17 December 2025 (UTC)Reply

@ChildrenWillListen: Added Added to Spam blacklist. --Dirk Beetstra T C (en: U, T) 04:03, 17 December 2025 (UTC)Reply

Proposed removals

[edit]

dmtn1.com

[edit]


I would like to request the whitelisting of the following domain:

  • **Domain:** dmtn1.com
  • **Name:** Digital Media Time News
    • Reason for request:**

The website publishes news summaries and reports. A specific URL is needed as a reference for improving Wikipedia content. The site does not engage in spam behaviour on Wikipedia, and I will only use it to verify factual information. If needed, only a **single page** may be whitelisted instead of the entire domain.

Thank you.

~2025-39001-05 (talk) 16:57, 7 December 2025 (UTC)Reply

Not done Wrong section (see Proposed removals), but also declined as this was just blacklisted for cross-wiki spam a few months ago and I don't see any indication that it's suddenly become a useful source instead of a source for spam. EggRoll97 (talk) 18:05, 7 December 2025 (UTC)Reply
<- moved to the correct section + fixed heading --Johannnes89 (talk) 15:50, 14 December 2025 (UTC) ->Reply

Troubleshooting and problems

[edit]

Blocking sub-pages but not the domains

[edit]

Fake signature to keep this open for some time --Dirk Beetstra T C (en: U, T) 00:00, 1 February 2026 (UTC)Reply

In some (rare) cases, we don't want sub-pages of a specific website to be linked to, but we want allow to link to the main homepage of the website, e.g. in the wikipedia article about the website.
linktr.ee is such a case:
Deep links to linktr.ee/... shall be blocked (currently). But the homepage linktr.ee (without a following path) shall be allowed in articles such as w:de:Linktree, w:fr:Linktree, w:en:Linktree, ...
In enwiki they currently use a work-around by linking to linktree.com which is a redirect to linktr.ee. But actually it would make more sense to directly link to linktr.ee.
In frwiki they are using the wikidata entry.
In ptwiki they use similar work-around as enwiki but with an actually wrong link description: [1]
In such cases I suggest to use a SBL entry such as

linktr\.ee(?=/.)

This would block every subpage of the website, but would keep linktr.ee and linktr.ee/ still linkable. Then the Wikipedias would not have to build strange work-arounds. The conditions for this would be:

  • Deep links to this are the reason for being blacklisted,
  • link to main page of the domain are/were no reason for the domain being blacklisted, and
  • there is a good reason for linking to the main page.

If nobody objects, I would do this for linktr.ee (and I would also test it, just do be sure). And in similar cases we could do the same (if my idea works). -- seth (talk) 20:14, 3 December 2025 (UTC)Reply

This would also match linktr.ee/?search=params and linktr.ee/#hash, but those are extreme edge cases. Support Support NguoiDungKhongDinhDanh 20:25, 3 December 2025 (UTC)Reply
Yes, and a blacklisting of those links would be intended. If there are specific pages to be excluded from block we can still use negativ look-ahead assertions as before.
-- seth (talk) 20:34, 3 December 2025 (UTC)Reply
Test[2]
seth (talk) 20:26, 3 December 2025 (UTC)Reply
Hmm, i had a vague recollection that i had tried this before, but there were problems. Maybe these memories weren't hallucinations after all.
-- seth (talk) 20:31, 3 December 2025 (UTC)Reply
Apparently, the regular expressions apply only to the domain part. From mw:Extension:SpamBlacklist:

The SpamBlacklist extension prevents edits that contain URLs whose domains match regular expression patterns [...]

Rather unfortunate. NguoiDungKhongDinhDanh 20:36, 3 December 2025 (UTC)Reply
No, that's not true.
However, the solution is even easier:
linktr\.ee/.
We don't actually need assertions here.
-- seth (talk) seth (talk) 20:41, 3 December 2025 (UTC)Reply
I too just realized how incorrect that is. Updated the extension page. NguoiDungKhongDinhDanh 20:49, 3 December 2025 (UTC)Reply
Those cases are not very rare, but often, if not almost always, the domain IS the issue. People are abusing pornhub.com (not some subpage on it), people are spamming their company website (not some subpage). We have local whitelists, just whitelist a neutral landing page (or the top domain page; linktr.ee$) on your local wiki. But note that if you are ill-meaning, you could use this as a workaround on the blacklist, and spamming is never in good faith (I will not explain per en:WP:BEANS).
(note, that if linktree.com is a redirect to linktr.ee, then linktree.com should be blacklisted as you then likely can just do linktree.com/artistname and link it anyways; there is a reason we liberally blacklist redirects). Dirk Beetstra T C (en: U, T) 09:06, 4 December 2025 (UTC)Reply
Yes, sometimes the main page of a website is an issue itself, that's why i explicitly added the second point in the three points above.
Local whitelist is not a good solution in cases such as linktr.ee.
1. Practically the local whitelist is rarely used in such cases as i showed above.
2. It's complicated and it's annoying authors (that's actually the main reason for 1., too).
3. It makes much more work than a centralized solution.
4. It increases the entropy and makes it more difficult and intransparent to see differences between different wikis (even if there are no differences).
The whitelist solution in such cases just has disadvantages. Please mail me the possible cirvumvention. I don't see right now. (And i guess, if i don't see it, at least 99,9% of possible spammers won't see it, too. However, i guess, most links to linktr.ee are not meant as spam, but that's another topic.)
Concerning: linktree.com, yes i had the same idea yesterday, but i did not check. But at least in enwiki it's used in 12 articles: w:en:special:linksearch/*.linktree.com. So yes, actually both domains should be blacklisted.
-- seth (talk) 10:15, 4 December 2025 (UTC)Reply
On en.wikipedia we always use the whitelist, we do not do exclusion rules on the blacklist. It is practical, the discussions tells you what it is whitelisted for, and when referring back to that discussion you can see where it should not be linked (hence, it is more transparant that it can be done). Moreover, the case between pornhub.com and linktr.ee means that you have 2 different solution as you cannot exclude the domain for the former and need a workaround, and you would do it for the latter, also making it less transparent and clear. It is much less work than having to clean up behind abuse by the people who do link it (one 'workaround: '* see "account" on http:// linktr.ee' will still appear in articles as people cannot link to 'https:// linktr.ee/account'), and as I said, it can still be abused by the (always insisting) spammers for other links. I know it, and I recall it being used (I think it was even the reason we insist on en.wikipedia for about pages), and has been used to get url shorteners work in an official setting. The centralized discussion (which is more request and approval, not discussion) is on en.wikipedia fully automated like the blacklisting script here, so hardly more work. Regarding linktr.ee is not spam, people say that as well for youtu.be, and it was hammered for some time, people use redirects to link to their company website, and I don't see why they could not put their company on linktr.ee as a 'soft redirect' to their company website. I am afraid you are underestimating the insistence and creativity of spammers (see also #Please_add_recorder.easeus.com_to_the_spam_whitelist .. 17 years ..).
In short I don't see the real advantage for the few websites where we could implement this. Dirk Beetstra T C (en: U, T) 11:12, 4 December 2025 (UTC)Reply
  • The 'workaround' you mentioned, can be used even now -- just without a link to the main page. But probably nobody tried this so far. The logs look like people try to link to the subpage several times and then skip it. They don't try to link to the main page.
  • "On en.wikipedia we always use the whitelist": Obviously not in this case.
  • Currently regular authors get annoyed by the current solution.
  • A central blacklist rule like linktr\.ee/. would be clearer and easier to maintain than scattered local whitelists, which often lead to inconsistent workarounds across projects -- which i have shown for linktr.ee. If spam of the main page of linktr.ee emerges, we can always tighten the rule later -- that's no big deal. And unlike now, we would have an easier way to track if your 'workarund' has been used (just by using the linksearch). Right now, it's not easy to track whether people used any workaround to somehow point to linktr.ee.
For now, the suggested approach seems the most transparent and practical and less annoying -- at least concerning linktr.ee.
For other pages it might be reasonable, but has to be decided case by case.
Here, i primarily wanted to propose a solution for such (rare) cases. You are sceptical, and that's ok, but concerning the proposal you did not have solid counterarguments (or i did not understand them).
Secondary, i wanted to know, whether there are any strong objections concerning the modification \blinktr\.ee -> \blinktr\.ee/. AFAICS you don't like that solution, because it could theoretically be 'cirvumvented'. But we don't have empirical data for this, so we could just try it and evaluate it.
-- seth (talk) 23:37, 4 December 2025 (UTC)Reply
Ok, let's try it and see whether it works.[3] -- seth (talk) 15:08, 6 December 2025 (UTC)Reply
No, it can’t be used now, you have no data, you don’t want to see how persistent spammers are. And clearly you don’t like the whitelist while the whitelist is the only solution for many of them. It annoys regulars on EXACTLY one page per wiki per domain. Wow .. what a problem. And we simply can’t exclude pornhub, because it is the top level that is abused. It only works for a small selection of sites, like linktr.ee (with editors getting ‘annoyed’ on all the others), the rest will have scattered whitelists and this just makes it more confusing. The whitelist on en.wikipedia has always worked smoothly, it clearly works there, editors are not annoyed, they know what to do. Dirk Beetstra T C (en: U, T) 04:34, 7 December 2025 (UTC)Reply
By the way, the workaround now works for linktr.ee, if linktr.ee identifiers are available in wikidata, I could have infoboxes display working links to the linktr.ee of each person who has it on wikidata in one edit. And with a little sockpuppetry a similar rule on proper spam would enable it all back in without it being noticed by spamcheck or COIBot. Dirk Beetstra T C (en: U, T) 05:29, 7 December 2025 (UTC)Reply
Sorry, i don't understand your scenario. And i'm not sure whether we are talking about the same thing.
Btw. wikidata: At wikidata the link to linktr.ee was added a few weeks before it got blacklisted globally. If it had been blacklisted before, then the addition of the link in the wikidata item would not have been possible. And then the above mentioned solution of frwiki etc. would not have been possible, too.
Of course, those wikis could have added items to the their whitelist, but 1. it's not so easy to whitelist just the main page (then you need look-ahead assertions) and 2. it's just a collective waste of time to add something to all local whitelists when it can be easily be solved centrally.
Yes, the proposed solution works for specific cases only. I've never said anything else.
-- seth (talk) 14:50, 7 December 2025 (UTC)Reply
I know you don't understand my scenario, we are not talking about links on wikidata, I am talking about handles on wikidata, and links on local wikis. Again, spammers will go at long lengths to get their material linked, you underestimate the insistence of spammers. You can abuse this, and the only way that we are going to get that is to wait for examples.
The point is, I don't want to whitelist 'just the mainpage', you don't want to whitelist the mainpage. That is just the link that you need to circumvent the blacklist. You want to whitelist a neutral landing page. Use the /about, or an index.htm. You need that for ALL proper spam sites, and for all properly abused websites (pornhub and such), you can't use the mainpage because that is the page that is/was being spammed/abused in the first place. (I could argue that for pornhub we consider could allow all but the mainpage, but then the shift is just too easy that even a primary schoolkid would know their way around!).
This proposed solution works for specific cases only, indeed. And for those (in my opinion, very few) specific cases we now create a loophole. And doing that on just a site that is intended to promote, a 'soft redirect site', just for convenience. Dirk Beetstra T C (en: U, T) 05:36, 8 December 2025 (UTC)Reply


Let's leave this open for a couple of weeks, then re-assess. --Dirk Beetstra T C (en: U, T) 10:40, 14 December 2025 (UTC)Reply

Discussion

[edit]

Tooling / cleaning

[edit]

False signature to avoid archiving: Dirk Beetstra T C (en: U, T) 00:00, 1 January 2026 (UTC)Reply

In #hometown.aol.co.uk we mentioned several ideas.

  1. It would be nice to have a tool that shows when an entry (or at least a specific domain or page) was last triggering a blacklist entry.
  2. As in 2015 [4] (see also Archives/2015-01) we should delete old entries that have not triggered the SBL for x years. (x = 5?)
  3. It might be reasonable to move simple SBL entries, i.e. domains, (that are not locally whitelisted) to the global BED (list of blocked external domains). However, Special:BlockedExternalDomains is disabled. So it this an option now anyway?

Concerning 1.: I'm using a script for this. But for every domain it needs ~1000 db requests (one for each wiki). So I'm not sure, whether I should put that in a public web interface. -- seth (talk) 14:58, 5 October 2025 (UTC)Reply

Re 1. The URL for SBL hits is encoded in logging.log_params in a non-indexable way (see e.g. quarry:query/97741). To make that feasible we would need to collect hits in a user db. I have been thinking about doing this for spamcheck for quite a while.
Re 2. We could, but I don't see why that should be a priority IMHO.
Re 3. There is no global BlockedExternalDomains, see phab:T401524. Once this is implemented with a way to allow local whitelisting we can move stuff over. Count Count (talk) 15:28, 5 October 2025 (UTC)Reply
1. Yes, using the `logging` table in the db is what I do in my script and what I also did in 2015. I'm using the replica db at toolforge. Using the replica directly, i.e. without having a separate db that contains the needed information only, searching all ~1000 wmf wikis takes about 1 or 2 minutes for a given regexp.
2. I mentioned reasons in the thread above. In short: performance. However, you don't need to do anything. I'd do that.
3. "Once it is implemented [...]": I see. So let's skip that for now.
-- seth (talk) 17:56, 5 October 2025 (UTC)Reply
1. I've backparsed the db once (and some of that data is in the offline linkwatcher database), however that is taking a lot of time, and since it are one-off runs it does not go up-to-date. Search engine would be nice per-wiki (just looking back for the last n additions, with n defaulting to 2 or 3, looking backward for a choice timeframe), and one for cross wiki (with additional limitation for 'the big 5 wikis', 'the big 18 wikis + commons and wikidata', . For the application I suggested it does not have to find all additions, just the last couple.
2. I agree with the sentiment that it does not have priority, that performance loss is minimal, and I don't feel particularly worried if I blacklist 100 domains in one go that I bring the wiki down. Cleanup is good, though, it has a couple of advantages in administration as well (the occasional 'this website was spammed 15 years ago, it has now been usurped by another company', editing speed on the lists, easier to find complex rules).
3. BED really needs the whitelist to work on it, otherwise especially a global BED is going to be a pain for local wikis. Dirk Beetstra T C (en: U, T) 06:02, 6 October 2025 (UTC)Reply
Unfortunately it seems that BlockedExternalDomain hit log entries are not being replicated to the Toolforge replicas. The log entries are just missing there. @Ladsgroup Is that on purpose? Count Count (talk) 07:56, 6 October 2025 (UTC)Reply
Compare e.g. de:Special:Redirect/logid/140172685 and quarry:/query/97766 Count Count (talk) 07:59, 6 October 2025 (UTC)Reply
@Count Count Hi. I don't think that's on purpose and fixing it is rather easy. Would you mind creating a phabricator ticket assigning it to me with link to this comment? Thanks Amir (talk) 15:32, 6 October 2025 (UTC)Reply
@Ladsgroup: Done, see phab:T406562. Thanks for having a look! Count Count (talk) 10:19, 7 October 2025 (UTC)Reply
Thanks! Amir (talk) 10:27, 7 October 2025 (UTC)Reply
1. I wrote a script to fetch all SBL data from all wmf wikis since 2020 and write all the data into a sqlite-db (the script needs ~7 minutes). This is not soo big (3,4M datasets in a 0,7GB db-file) and could be a) updated automatically (e.g. every day or every hour) and b) used in a little web interface to search the data. If I automatically delete all data that is older than 5 years, this might even scale. After the bug Count Count mentioned will be fixed, I could add the BED logs.
-- seth (talk) 22:00, 7 October 2025 (UTC)Reply
@Lustiger seth: Very cool. With this relatively miniscule amount of data I don't think that there even is a need to delete older data at all. It would be great if you could make the data available on a public ToolsDB database. Count Count (talk) 04:28, 8 October 2025 (UTC)Reply
With that size I would even suggest to go back to the beginning of time. We do have some really long-term spamming cases (10+ or 15+ years). I ran into spamming of a website that related to one of those cases just last month. Having access to that (preferably through a query link in our {{LinkSummary}} (and maybe also the user templates) would be great. Dirk Beetstra T C (en: U, T) 06:35, 8 October 2025 (UTC)Reply
Ok, the script created a database s51449__sbllog_p with one single table sbl_log containing all sbl log entries of all wmf projects now and is continuously updated every 5 minutes. Its size is around 1,7GB and is has 8,1M entries. The columns are: id, project (e.g. 'dewiki'), log_id (local log_id), log_timestamp, log_namespace, log_title, comment_text, log_params (just the url), actor_name.
Next step is the creation of a web interface for queries. I'll try to do that on weekend.
-- seth (talk) 22:19, 9 October 2025 (UTC)Reply
Please both by user and domain so they can be linked from our {{LinkSummary}} and {{UserSummary}} templates. Dirk Beetstra T C (en: U, T) 12:36, 10 October 2025 (UTC)Reply
@Lustiger seth: For faster domain querying Mediawiki (and spamcheck) store and index hostnames in reverse split order (e.g. www.google.com becomes com.google.www.. Maybe you could add such an indexed column and either keep the full URL or break it up in protocol + hostname + rest? Count Count (talk) 13:30, 10 October 2025 (UTC)Reply
Because of RL things I haven't continued the work. I'll try to do something this weekend.
-- seth (talk) 09:32, 18 October 2025 (UTC)Reply
You can test it via:
It's very slow if date_from < 2025 or if the URL field does not contain anything with an extractable domain.
-- seth (talk) 14:12, 20 October 2025 (UTC)Reply
Oh, I'll remove the debugging output later, of course. I thought it might be helpful at the moment.
-- seth (talk) 14:14, 20 October 2025 (UTC)Reply
I have built it into the relevant templates here on meta and on en.wikipedia. Thanks! Dirk Beetstra T C (en: U, T) 10:05, 21 October 2025 (UTC)Reply
@Lustiger seth Thanks!! Looks like domain_rev_index is missing an index to make it fast though: quarry:query/98322 Count Count (talk) 10:35, 21 October 2025 (UTC)Reply
I did a CREATE INDEX idx_rev_index ON domain(domain_rev_index); now. Is it significantly faster now?
-- seth (talk) 13:28, 21 October 2025 (UTC)Reply
Oh yes. Just tried it on idealrentacar.ro and get results in less than a second, same for all other domains I tried it on. Count Count (talk) 13:33, 21 October 2025 (UTC)Reply
Ah, ok, maybe my test was to early, because I still had to wait several dozens of seconds.
But now the results come fast for me, too.
-- seth (talk) 14:19, 21 October 2025 (UTC)Reply
Should I add an index to sbl_log.actor_name, too?
-- seth (talk) 14:54, 21 October 2025 (UTC)Reply
I think, that is a good idea. For spamcheck I am getting the global user id and storing that instead which survives renames and takes up a little less space but that is not really necessary IMHO. Count Count (talk) 06:44, 22 October 2025 (UTC)Reply
I added an index for actor_name, now.
Global user id: I see, yes, makes sense. Let's hope that it's really not necessary. :-)
-- seth (talk) 20:49, 22 October 2025 (UTC)Reply
The query should probably be for WHERE d.domain_rev_index LIKE ? with the param being e.g. 'com.chasedream.%' so we get hits for www.chasedream.com as well. And each reversed hostname/domain should end with a '.' so that we can thus match both 'www.chasedream.com' and 'chasedream.com' but not 'chasedreamxyz.com' like this. Count Count (talk) 13:42, 21 October 2025 (UTC)Reply
You are totally right. Should be done now.
-- seth (talk) 14:53, 21 October 2025 (UTC)Reply
Works great, thank you! Count Count (talk) 06:40, 22 October 2025 (UTC)Reply
Ok, first step done. Maybe this or next weekend I'll have a look at the second step.
-- seth (talk) 08:49, 30 October 2025 (UTC)Reply
Hmm, haven't started yet. This needs more time.
Nevertheless, I found a bug in my scripts. The new database of the logs has several wrong entries, because some of the original entries in the wmf tables are really strange. I'll fix that bug first and then rebuild the tables.
-- seth (talk) 13:13, 9 November 2025 (UTC)Reply
At least that bug should be fixed now. There was a problem with punycode and utf8. And the script does not add several strange entries any longer.
Tables are rebuilt already.
-- seth (talk) 22:35, 30 November 2025 (UTC)Reply