Talk:Spam blacklist/Archives/2008-09

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Warning! Please do not post any new comments on this page. This is a discussion archive first created in September 2008, although the comments contained were likely posted before and after this date. See current discussion or the archives index.

Proposed additions

Symbol comment vote.svg This section is for completed requests that a website be blacklisted

Just 2 links until now, but I believe he is just beginning to add them, I put him on our bl. Best regards, --birdy geimfyglið (:> )=| 23:06, 13 August 2008 (UTC)

This can stay, I think. Added Logged  — Mike.lifeguard | @en.wb 19:45, 15 August 2008 (UTC)

People may want to look into other things on the same server:

  • Top 10 domains on server ( (153), (52), (13), (7), (6), (5), (4), (4), (3), (2)

  • Top 10 editors who have added (108), (10), (8), (7), LovelessGent (5), (4), (3), (2), (2), (2).

The top one has 108 linkadditions, and a range of wikis.

Seems to lead even further, has a different set of IP users (in a 41.210 range and some others), but that seems en only (but where did I see recently).


I need help to prune this out completely. --Dirk Beetstra T C (en: U, T) 17:16, 18 August 2008 (UTC)

Just a quick look, but I think the following could be added:
 — Mike.lifeguard | @en.wb 01:55, 29 August 2008 (UTC)

Added Added  — Mike.lifeguard | @en.wb 13:40, 31 August 2008 (UTC)


See WikiProject Spam item (permanent link). MER-C 12:54, 25 August 2008 (UTC)

Added Added. --Erwin(85) 13:36, 26 August 2008 (UTC)

A Polish redirect site

Appears to have been used by Web-anatomy today: [1], from

  • The current uses might be legitimate.

--AVRS 15:24, 25 August 2008 (UTC)

Again: [2]

--AVRS 16:52, 25 August 2008 (UTC)

Added Added and cleaned the main namespaces. --Erwin(85) 09:23, 26 August 2008 (UTC)

Suspicious site

An user reported that it might contain malware such trojan horses. Reported on the es-wiki.

--Dferg (talk) 19:53, 27 August 2008 (UTC)

No details here, and I don't see anything at the site itself. X mark.svg Not done unless we know it is really malware.  — Mike.lifeguard | @en.wb 02:00, 29 August 2008 (UTC)


Seems to be a new spambot activity: seen here IP was:

I have not yet blocked the IP because I fear they will use multiple ones I already added the site to the bl. Maybe it should be removed later again, thanks, --birdy geimfyglið (:> )=| 13:53, 28 August 2008 (UTC)

i shortend and fixed the entry. "\b" matches a word boundary. -- seth 15:05, 28 August 2008 (UTC)
Added Logged - this can stay blacklisted (& seems to contain malware?!)  — Mike.lifeguard | @en.wb 01:45, 29 August 2008 (UTC)

Affiliate spam

Contributions User:Tor sk.wikipedia

Only in 3 Wikipedias that I know, but I can not see how these links can ever be of any benefit to a Wikimedia wiki.

--Jorunn 14:44, 31 August 2008 (UTC)

Added Added whole domains as above.  — Mike.lifeguard | @en.wb 14:50, 31 August 2008 (UTC)

sexyunderwear-weddingdress spam

Spam domains

Spam accounts

--A. B. (talk) 21:44, 2 September 2008 (UTC)

Added Added & thanks again, A. B.  — Mike.lifeguard | @en.wb 00:30, 3 September 2008 (UTC)

Luna Musik Management, Guzman Construction

Spam domains

Spam account

Alemannic is certainly an odd choice of languages to spam; I suspect he chose it because it was at the top of the list of cross-wiki links for the en:Hip Hop article and then got interrupted.


--A. B. (talk) 16:22, 3 September 2008 (UTC)

Added Added. Best to catch things early & these are not useful to our projects.  — Mike.lifeguard | @en.wb 21:22, 3 September 2008 (UTC) spam

Spam domains

Spam accounts

--A. B. (talk) 02:48, 5 September 2008 (UTC)

Added Added. Thanks. --Erwin(85) 08:31, 5 September 2008 (UTC)

Additional spam

I looked at the spam domains above a bit closer using the WhosOnMyServer tool. Several domains were hosted by commercial web-hosting services and their servers contain hundreds of unrelated domains. I did, however, identify one German server,, with a cluster of the spam domains above plus a few other Turkish domains. About half of those domains turned out to be related to the domains I reported above and several had also been spammed:

Also spammed

Other related domains

Additional spam account

--A. B. (talk) 18:25, 5 September 2008 (UTC)

Added Added the spammed ones.  — Mike.lifeguard | talk 18:03, 6 September 2008 (UTC)

See [3] and [4], a sockfarm of users all adding links to work by Dan Schneider, mostly from his website 120 links cleaned from enWP and a number from other wikis (I'm on it but it's slow as I have to cross-check who added them). Where the links are added by anons, it is a stable subnet. Definitely a candidate for blacklisting on enWP and probably a candidate for meta blacklisting due to cross-wiki issues, albeit fairly limited by comparison wiht the extensive enWP abuse. JzG 21:54, 1 September 2008 (UTC)

Is this really something we want to be linking to regardless of who added it? Furthermore, it seems from the links you've provided that several domains are involved. Has blacklisting been discussed on enwiki yet?  — Mike.lifeguard | @en.wb 18:22, 2 September 2008 (UTC)
Yes, it's now blacklisted on enWP (and one of Schneider's socks promptly requested whitelisting). The list of socks is up to about 40 now [5], but primarily an enWP issue. Still, there is some cross-wiki activity and the guy is very determined to use Wikipedia to make his the "The Most Widely Read Interview Series In Internet History!" - to which I cynically respond: {{fact}}. JzG 18:05, 4 September 2008 (UTC)
Added Added  — Mike.lifeguard | talk 16:37, 7 September 2008 (UTC)

Franchising spam

{{linksummary|}} {{linksummary|}} Per luxo: and Cross-wiki vandalism report. Perhaps a tad stale. The IP has been reverted since then.  — Mike.lifeguard | talk 14:17, 4 September 2008 (UTC)

In his contributions I see spamming of

and not the domains you mentioned. Blacklist those? --Erwin(85) 08:40, 5 September 2008 (UTC)
Yes... Now to find out who is spamming those domains.  — Mike.lifeguard | talk 11:21, 5 September 2008 (UTC)
Sorry, no useful data in my database .. --Dirk Beetstra T C (en: U, T) 11:24, 5 September 2008 (UTC)

Added Added. --Erwin(85) 11:39, 6 September 2008 (UTC)

Please see [6]. (Blocked at Commons) Cirt (talk) 09:55, 5 September 2008 (UTC)

Added Added. Thanks. --Erwin(85) 10:00, 5 September 2008 (UTC)
Oh I think the domain is, not bara... Cirt (talk) 10:01, 5 September 2008 (UTC)
\b is a special character in regex. It's used to make sure that e.g. \bbar\.com\b matches the spam domain, but not the good domain --Erwin(85) 10:12, 5 September 2008 (UTC)
Ah okay thank you. Cirt (talk) 10:27, 5 September 2008 (UTC)

Redirect domain

We should add \bnig\.gr\b  — Mike.lifeguard | talk 01:27, 15 September 2008 (UTC)

Is/was it used on any wikipedia? -- seth 06:46, 15 September 2008 (UTC)
Added Added. I don't think there's a need for it to be used before blacklisting. It should never be used. I've added it to the bottom of the list. Do we still use # URL shorteners? Or any other section? --Erwin(85) 18:49, 15 September 2008 (UTC)
Well, if there's no need for links to be used before blacklisting, we could directly blacklist about 1 million websites (ad stuff, porn stuff, ...), what would probably become a problem of performance. -- seth 20:09, 15 September 2008 (UTC)
All url shorteners / redirectors are blacklisted by default, otherwise they can be used to bypass the blacklist. JzG 16:21, 22 September 2008 (UTC)

crosswiki spam

--Shizhao 02:15, 18 September 2008 (UTC)

in deleted contribs on Commons too. Added Added all 3.  — Mike.lifeguard | talk 17:49, 18 September 2008 (UTC)

Already locally blocked on wiki-de (the german one). MoiraMoira 07:35, 19 September 2008 (UTC)

Added Added. --Erwin(85) 08:58, 20 September 2008 (UTC)

URL shorteners

  • Added Added as uncontroversial. useurl was being used to bypass the blacklist on en, memurl was used to bypass the blacklist on pt. Not sure about the others. JzG 12:57, 22 September 2008 (UTC)
  • (Likely) misused is (see COIBot report), was used only once, the other three are not used as far as I can see (bot-downtime and before start I can't see .. yet). --Dirk Beetstra T C (en: U, T) 10:56, 23 September 2008 (UTC)
Links which are not used (for spamming) should not be blacklisted imho, because otherwise we could easily increase the number of blacklisted domains by thousands. This will would lead to nothing but a speed-down. -- seth 15:36, 23 September 2008 (UTC)
Seth, redirects and url shorteners are always added, uncontroversially, because otherwise they can be used to bypass the blacklist. If the blacklist gets too ig it won't be because of a few tens of redirect sites. Many of the ones on the list I linked are already linked anyway. And actually I think all of these had at least one link somewhere which was bypassing a blacklisted domain or linking a domain which should have been linked direct. JzG 22:42, 23 September 2008 (UTC)
We (c/sh)ould set up a list of these, as it would be nice to know when they get abused (as with the blacklist, also my bots would run into problems keeping an eye on all of them .. I am slowly running into that limit with them). I do believe that these should NEVER be used (as opposed to porn/other commercial stuff, where, if notable (now or in the future), there may be proper use of it. --Dirk Beetstra T C (en: U, T) 15:55, 23 September 2008 (UTC)

commercial news site spammed wikiwide. This morning crosswiki on Silvio BerlusconiMoiraMoira 07:38, 19 September 2008 (UTC)

Is dynamic address from Italy, tomorrow replaced all the linkspam I removed - see here MoiraMoira 08:08, 20 September 2008 (UTC)
I many IP ranges (Matucana (49), (40), (21), (21), (20), (12), (11), (9), (8), (6)) and one account. This does not look good. All to the page of Silvio Berlusconi, and seen that there are many reports per language, nowhere wanted.
Consider Added Added. --Dirk Beetstra T C (en: U, T) 08:36, 20 September 2008 (UTC)
There may be useful content here as well, but as the majority of the links is 'pushed' to Silvio Berlusconi, I'd keep it here until it gets questioned .. --Dirk Beetstra T C (en: U, T) 08:50, 20 September 2008 (UTC)

The following discussion is closed.

This is a commentary and opinion website, not based on any sort of research, but is basically a resource for people looking up urban legends. Although the site owners have a long history of supposedly investigating the claims of hearsay, jokes, and legends, it is largely driven by what amounts to yet another unverified source. The reason for the blacklist is that at first glance the site appears like an authority on any number of late-breaking legends, where in reality it is just a veiled opinion of the author(s) on whatever the topic might be. In short, the site appears to be an encyclopedia of urban legends, but it is in fact a mixture of comedy, opinion, hearsay, and legend itself. This puts it in the same category as a number of self-published blogs. Uruiamme 21:17, 20 September 2008 (UTC)

It's hardly an unreliable source. The site owners use other sources for their work, that's normally listed at the end of an article. Has this been spammed anywhere? I don't believe the owners make a profit from the site, and it only runs adverts to keep it going. Stuff should only be put on this list if it's actually spammed. Majorly talk 21:21, 20 September 2008 (UTC)
  • Declined Declined, no evidence of spamming, stated reason is out of scope for this blacklist absent evidence of abuse. That and the inconvenient fact that Snopes is probably the most widely trusted urban legend reference. JzG 11:37, 21 September 2008 (UTC)
I hardly thought that this would be given such a cursory look. I know that the people are reputable, but my point was that they are neither peer reviewed nor unbiased. The people who run many blogs are reputable, so that seems hardly much of a positive. But that is not my main contention. The main issue is that there is at least one area of self-published content available on the forums there, which surely you aren't implying has the same reputation as the portion full of site-owner content? In other words, it is rife with the typical forum/blogging things, and it does have its own sub domain. I assumed someone might independently discover that. Uruiamme 05:15, 22 September 2008 (UTC)
Please read the header above. This list is for controlling abusive linking of websites, not to enforce one side's view in a dispute over the reliability of a certain source. Feel free to bring this up on the talk pages for the articles where you believe the link is being incorrectly used. JzG 12:44, 22 September 2008 (UTC)
X mark.svg Not done per JzG.  — Mike.lifeguard | talk 19:29, 22 September 2008 (UTC)

Merlin Wikia

On the English Wikipedia site an IP user, has been adding spam links to I propose that this link get's blacklisted as the IP user had posted it to several user talkpages, which goes against the Wikipedia Policies. Dark Mage 18:25, 21 September 2008 (UTC)

If this is a problem only on en.wikipedia it should be dealt with on the local blacklist en:MediaWiki talk:Spam-blacklist. This blacklist is for spam across many wikis.
Any Wikia wiki can be linked to with interwikis, so a blacklisting of will be very easy to outflank. --Jorunn 09:11, 22 September 2008 (UTC)
I actually don't see any linkadditions to (so maybe it is already used as an interwiki?). --Dirk Beetstra T C (en: U, T) 10:36, 23 September 2008 (UTC)
Declined Declined per Jorunn - this would be pointless.  — Mike.lifeguard | talk 20:21, 28 September 2008 (UTC)

This is a malicious link added to en.Wikipedia.[7] It doesn't seem to harbor a virus but it's semi-pornographic images(?- hard to tell I didn't look for long!) and the code resizes your browser window and makes it bounce around the screen. Only added once on en.wikipedia as far as I can tell but seems to have no legitimate use and ought to be cross-wiki blacklisted as other malicious sites are. -- SiobhanHansa 21:53, 23 September 2008 (UTC)

Unquestionably malicious. Added Added & thanks.  — Mike.lifeguard | talk 22:08, 23 September 2008 (UTC)
There have been other additions, see the coibot report, needing research:

(All on:

). Waiting for the reports (are queued). --Dirk Beetstra T C (en: U, T) 09:51, 24 September 2008 (UTC)
Added Added more - good catch.  — Mike.lifeguard | talk 16:15, 24 September 2008 (UTC)

Proposed removals

Symbol comment vote.svg This section is for archiving proposals that a website be unlisted. is a fine site, referring to a Geocities page. No spam, no porn. There are many pages about Lluis Llach, and the link was accepted by the Polish one. Bloking really does not seem necessary. The preceding unsigned comment was added by (talk • contribs) 12:17, 16 Aug 2008 (UTC)

The site (as you have spelt it) does not appear to be blacklisted here. Thanks --Herby talk thyme 12:23, 16 August 2008 (UTC)
the sbl is case-insensitive, the entry is
for a given url you can use [8] (beta state) to find the corresponding entries. -- seth 13:53, 16 August 2008 (UTC)
Thanks seth - that way it is here because of this report. It was reverted, links placed again so listed. Looks valid to me. For anyone who doesn't look at it the appeal is by the Ip that was responsible for the link placement. Cheers --Herby talk thyme 13:55, 16 August 2008 (UTC)
  • I think we should decline - geocities pages are of no use to the encyclopaedia. Rather the reverse, in fact. 23:07, 16 August 2008 (UTC)
Declined Declined per Herby and original report.  — Mike.lifeguard | @en.wb 23:16, 16 August 2008 (UTC)

i was about to use as reference for an article, but its blacklisted - is there any special reason? -- 21:22, 28 August 2008 (UTC)

The reason is here, though I couldn't find the conclusion of that discussion quickly (and the log entry doesn't specify an oldid :\ not sure how that happened).  — Mike.lifeguard | @en.wb 02:24, 29 August 2008 (UTC)
OK, the full discussion is archived. Given the self-published nature of that domain, and the issues with POV-pushing over a long period of time, I am happy to have this remain on the global blacklist rather than enwiki's local list. You may choose to request whitelisting for a specific use at w:MediaWiki talk:Spam-whitelist. Declined Declined based on the original report.  — Mike.lifeguard | talk 23:15, 6 September 2008 (UTC)
For the record, this was cross-wiki spammed. For example (this is just a small sample):
Here are some prior discussions:
--A. B. (talk) 04:11, 7 September 2008 (UTC)

Concerning regexp [0-9]+\.[-\w\d]+\.info/?[-\w\d]+[0-9]+[-\w\d]*\].
A few days ago I removed this entry, but was told afterwards, that every removing needs a de-list discussion. So here I go ([#double/wrong entries|again]).
Short: This entry never worked and does not seem to be needed, so imho it's the best to remove the entry.
Long: In the beginning of 2006 there had been this request, which was added immediately. It was modified some time later. But all versions of the entry never matched anything, because the spamblock extension does not work on link descriptions, but only on the link itself. So there will never be a match on whitespace or square brackets.
Now there are 2 possibilities: 1. fix the regexp or 2. remove it permanently.
The original request said that the urls were something like (integer number).(letter).(name).info (which could perhaps be translated into \d+\.[a-z]\.\w+\.info). But if one looks at the present sbl, one can't see even one entry like this. So probably there's no need to block those domains anylonger. The only possibiliy is that entries like "cinn\.info" and "ephraim\.info" are of this format but were inserted without third-level domains. However, a short look into the history of the sbl discussion does not verify that.
Altogether I suggest to leave the entry removed. -- seth 09:59, 7 September 2008 (UTC)

This entry (if done properly) is probably too broad; recommend removal.  — Mike.lifeguard | talk 19:35, 8 September 2008 (UTC)
X mark.svg Not done then.  — Mike.lifeguard | talk 12:12, 13 September 2008 (UTC)

Caracal pistol, italian Wiki article


Hello Erwin, I would like to know the reason of Caracal info european site being listed on spamlist/blacklist and the removal of the link of the italian article. I am the author of all Wikipedia articles in 16 languages related to the first pistol made in United Arab Emirates known as Caracal pistol and I regularly post the latest news on website to keep readers informed of the latest developments since day one. Sincerely Edmond HUET Quickload 09:55, 6 September 2008 (UTC)

Hi, as far as I can see there's only a small amount of information available on the web site. Most links point to your Domains for sale section. That and adding it to multiple wiki's caused me to blacklist it. Feel free to request removal from the blacklist at Talk:Spam blacklist. --Erwin(85) 11:38, 6 September 2008 (UTC)

Hi, Small amount of information? Maybe, you should click on the 10 buttons on the left when you are on any one of the pages There is no other site on the web and all the available infos related to Caracal pistol can be found on this site.


Hello, I request removal from blacklist, above quote explains why. Ask for more info if needed. Quickload 09:12, 7 September 2008 (UTC)

Clearly that domain is not blacklisted at meta.  — Mike.lifeguard | talk 16:23, 7 September 2008 (UTC) however is. Given your conflict of interest in this case, the cross-wiki additions and our norm of declining de-listing requests from site owners, this request is Declined Declined.  — Mike.lifeguard | talk 16:30, 7 September 2008 (UTC)

OK, given the related domains, and additions by Quickload, I think this may have turned into a request for listing. I'm normally not a fan of listing related domains, but Quickload seems to have a COI here, and is adding sites cross-wiki. Looking for input here. Related domains listed below.  — Mike.lifeguard | talk 16:35, 7 September 2008 (UTC)

Related domains

 — Mike.lifeguard | talk 16:35, 7 September 2008 (UTC)

I recommend blacklisting all. --A. B. (talk) 14:38, 10 September 2008 (UTC)
I see Quickload is a high-volume contributor; I suggest just sticking with the domain that's already blacklisted as long as Quickload does not add any more links to his own websites (it's a conflict of interest). Quickload, this applies as well to using anonymous IPs and alternate accounts. --A. B. (talk) 14:59, 10 September 2008 (UTC)
X mark.svg Not done then.  — Mike.lifeguard | talk 12:13, 13 September 2008 (UTC)


my site is, has been listed on

now it is black listed. Please remove it from black list because it is very useful and non profitable technical site.

This domain was blacklisted per the XWiki report. Linking excessively across many wikis is inappropriate regardless of whether you are profiting from doing so or not.
Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. If such an editor asks to use your links, I'm sure the request will be carefully considered and your domain may well be removed.
Until such time, this request is Symbol declined.svg Declined. — Mike.lifeguard | talk 16:50, 10 September 2008 (UTC)

Troubleshooting and problems

Symbol comment vote.svg This section is for archiving Troubleshooting and problems.

double/wrong entries

when i deleted some entries from the german sbl, which are already listed in the meta sbl, i saw that there are many double entries in the meta sbl, e.g., search for

top-seo, buy-viagra, powerleveling, cthb, timeyiqi, cnvacation, mendean

and you'll find some of them. if you find it useful, i can try to write a small script (in august), which indicates more entries of this kind.
furthermore i'm wondering about some entries:

  1. "\zoofilia", for "\z" matches the end of a string.
  2. "\.us\.ma([\/\]\b\s]|$)", for ([\/\]\b\s]|$) ist the same as simply \b, isn't it? (back-refs are not of interest here)
  3. "1001nights\.net\free-porn", for \f matches a formfeed, i.e., never
  4. "\bweb\.archive\.org\[^ \]\{0,50\}", for that seems to be BRE, but php uses ERE, so i guess, this will never match
  5. "\btranslatedarticles\].com", for \] matches a ']', so will probably never match.

before i go on, i want to know, if you are interested in this information or not. :-) -- seth 22:23, 12 July 2008 (UTC)

You know, we could use someone like you to clean up the blacklist... :D Kylu 01:53, 13 July 2008 (UTC)
We are indeed interested in such issues - I will hopefully fix these ones now; keep 'em coming!  — Mike.lifeguard | @en.wb 01:59, 13 July 2008 (UTC)
Some of the dupes will be left for clarity's sake. When regexes are part of the same request they can be safely consolidated (I do this whenever I find them), but when they are not, it would be confusing to do so, in many cases. Perhaps merging regexes in a way that is sure to be clear in the future is something worth discussing, but I can think of no good way of doing so.  — Mike.lifeguard | @en.wb 02:06, 13 July 2008 (UTC)
in de-SBL we try to cope with that only in our log-file [9]. there one can find all necessary information about every white-, de-white-, black- and de-blacklisting. the sbl itself is just a regexp-speed-optimized list for the extension without any claim of being chronologically arranged.
i guess, that the size of the blacklist will remain increasing in future, so a speed-optimazation perhaps will be necessary in future. btw. has anyone ever made any benchmarks of this extension? i merely know that once there had been implemented a buffering.
oh, and if one wants to correct further regexps: just search by regexps (e.g. by vim) for /\\[^.b\/+?]/ manually and delete needless backslashes, e.g. \- \~ \= \:. apart from that the brackets in single-char-classes like [\w] are needless too. "\s" will never match. -- seth 11:36, 13 July 2008 (UTC)
fine-tuning: [1234] is much faster in processing than (1|2|3|4); and (?:foo|bar|baz) is faster than (foo|bar|baz). -- seth 18:21, 13 July 2008 (UTC)
I benchmarked it, (a|b|c) and [abc] had difference performance. Same with the latter case — VasilievV 2 21:02, 14 July 2008 (UTC)
So should we be making those changes? (ie was it of net benefit to performance?)  — Mike.lifeguard | @en.wb 21:56, 15 July 2008 (UTC)
these differences result from the regexp-implementation. but what i ment with benchmarking is the following: how much does the length of the blacklist cost (measured in time)? i don't know, how fast the wp-servers are. however, i benchmarked it now on my present but old computer (about 300-500MHz):
if i have one simple url like and let the ~6400 entries of the present meta-blacklist match against this url, it takes about 0,15 seconds till all regexps are done. and i measured really only the pure matching:
// reduced part of SpamBlacklist_body.php
foreach($blacklists as $regex){
  $check = preg_match($regex, $links, $matches);
    $retVal = 1;
so i suppose, that it would not be a bad idea to care about speed, i.e. replace unnecessary patterns by faster patterns and remove double entries. ;-)
if you want me to, i can help with that, but soonest in august.
well, the replacement is done quickly, if one of you uses vim
the replacement of (.|...) by [...] can be done manually, because there are just 6 occurrences. the replacement of (...) by (?:...) can be done afterwards by
-- seth 23:26, 15 July 2008 (UTC)
some explicit further bugs:
\mysergeybrin\.com -> \m does not exist
\hd-dvd-key\.com -> \h does not exist
however, because nobody answered (or read?) my last comment... would it be useful to give me temporarily the rights to do the modifications by myself? -- seth 01:44, 7 August 2008 (UTC)
I fixed these. You can always request (temporary) sysop status. Any help is appreciated. --Erwin(85) 12:45, 7 August 2008 (UTC)
requested and got it. :-) -- seth 09:18, 13 August 2008 (UTC)

before i start modifying the list, a want to know, whether i should log my changes somewhere. oh, and btw. i suppose that the entry [0-9]+\.[-\w\d]+\.info\/?[-\w\d]+[0-9]+[-\w\d]*\] is somehow senseless, for it will probably never match. i found the original discussion [10] (the regexp was changed afterwards), but the regexp will not grep the links mentioned there. shall i just delete such an entry or shall a make a new request and try to correct it? -- seth 09:18, 13 August 2008 (UTC)

It would be nice if you could update the log as well, so we can still find the corresponding log message. Though maybe we should wait and see if anything new comes out of #The Logs. I guess it's best to correct wrong entries or in any case log all those removals. It probably wouldn't hurt if some were removed, but I have no idea how many entries we're talking about. --Erwin(85) 09:31, 13 August 2008 (UTC)
ok, so i'll wait until the other thread is finished. but i don't think, that a manipulating of the logs is a good idea, because this will make tracing of entry changes difficult.
i guess, there are less than 10, perhaps even less than 5 useless entries. -- seth 10:29, 13 August 2008 (UTC)
i cleaned up the sbl two days ago. until now i did not delete any entries (except for grouping purposes). and i could not correct the entry "\bnstpi\.com\.my/ client" (with a senseless space) because its diff wasn't very meaningful. perhaps somebody knows something about this entry and could tell it.
however, one question is: shall i really modify the wrong entries in the logs, too? it is like changing history, so it could cause irritations. -- seth 08:48, 26 August 2008 (UTC) which added the question marks, also blocked legitimate sites. For example chabad(east|usa|world)\.(am|com|org) and chabad\.am became chabad(?:east|usa|world)?\.(?:am|com|org) which blocked legitimate such as and A solution may be to remove the question marks for this entry and restore it to 2 entries like it was before. --PinchasC 14:20, 1 September 2008 (UTC)
Yes check.svg Done - Regex is now chabad(?:east|usa|world)\.(?:am|com|org), and should block what it's supposed to now.  — Mike.lifeguard | @en.wb 15:39, 1 September 2008 (UTC)
oops, sorry for my mistake. PinchasC is right. additional to Mike.lifeguard's correction i will re-insert the explicite entry chabad\.am. -- seth 08:26, 2 September 2008 (UTC)
"\bnstpi\.com\.my/ client": after looking at the request on the TP and the links mentioned there, i suppose, that the leading " client" could just be ignored, so i deleted it. otherwise the regexp would be totally useless. -- seth 10:13, 7 September 2008 (UTC)

what does "let's not use ?: - it makes COIBot unhappy[...]"[11] mean precisely? -- seth 23:55, 27 August 2008 (UTC)

Beetstra can tell you exactly, as he is the bot's owner. I believe it choked on that as it isn't handled properly in Perl. Also some of the very long regexes caused issues (but didn't change those). I am having second thoughts about consolidating regexes which are not part of the same request. Regexes added together can be mushed together easily, but those in separate requests should likely stay separate, I think. Not sure what to do next about this though.  — Mike.lifeguard | @en.wb 23:59, 27 August 2008 (UTC)
COIBot: well, perl could cope with non-capturing patterns /(?:foo)/ long before php even existed, so i guess it isn't really a perl-problem. i'll ask Beetstra on his talk page about that.
grouping: as far as i can see, the sbl-page can be used for blocking only. all relevant blocking information is listed in the log (and the links mentions there). so i don't see, how even a random sort on the sbl entries combined with randomly grouped regexps could harm. -- seth 01:49, 28 August 2008 (UTC)

double entries

I wrote a small script to grep most of the double (or multi) entries. The result is presented on User:Lustiger_seth/sbl_double_entries. As you can see, there are many (>250) redundant entries. I guess, we could delete more than 200 entries. -- seth 22:59, 19 August 2008 (UTC)

moved a discussion to previous thread. -- seth 08:26, 2 September 2008 (UTC)
So, as we now log removals too, I will delete double entries, if nobody raises objections. -- seth 19:17, 3 September 2008 (UTC)

done. some additional comments on deleted entries, which were not exactly double:

\.rr\.nu             # deleted, although it is not fully superseded by \brr\.nu\b, but almost. i guess that the domain .nu was meant, so the postfix "\b" is ok.
caiquecrazy\.us\.tt  # almost fully superseded by \bu[ks]\.tt\b
\.6url\.com          # almost fully superseded by \b6url\.com\b
\.flingk\.com        # almost fully superseded by \bflingk\.com\b
\.metamark\.net      # almost fully superseded by \bmetamark\.net\b
\.paulding\.net      # almost fully superseded by \bpaulding\.net\b
\.shorl\.com         # almost fully superseded by \bshorl\.com\b
\.shortlinks\.co\.uk # almost fully superseded by \bshortlinks\.co\.uk\b
\.simurl\.com        # almost fully superseded by \bsimurl\.com\b
\.smcurl\.com        # almost fully superseded by \bsmcurl\.com\b
\.tighturl\.com      # almost fully superseded by \btighturl\.com\b
\.yatuc\.com         # almost fully superseded by \byatuc\.com\b
\.yep\.it            # almost fully superseded by \byep\.it\b
\.ontheweb\.nu       # almost fully superseded by \bontheweb\.nu\b
\.isgre\.at          # almost fully superseded by \bisgre\.at\b
drugs\.isgre\.at     # same as above
\.byinter\.net       # almost fully superseded by \bbyinter\.net\b
drugs\.byinter\.net  # same as above
nigeria\.tz4\.com    # almost fully superseded by \btz4\.com\b
\binternet-history\.tz4\.com # same as above
\.edom\.co\.uk       # almost fully superseded by \bedom\.co\.uk\b
\.fw\.nu             # almost fully superseded by \bfw\.nu\b
\.redirect\.hm       # almost fully superseded by \bredirect\.hm\b
drugs\.passingg\.as  # almost fully superseded by \bpassingg\.as\b
\.shop\.tc           # almost fully superseded by \b(?:au|es|hk|hu|ie|it|kr|mx|pl|se|th|ua|us|shop)\.tc\b
\.explode\.to        # almost fully superseded by \bexplode\.to\b
\.zwap\.to           # almost fully superseded by \bzwap\.to\b
squidoo\.com/inexpensive-wine  # almost fully superseded by \bsquidoo\.com\b
squidoo\.com/localphoneservice # same as above
\bsearchtravel\.biz/countrylist/italy.php # almost the same as \bsearchtravel\.biz/countrylist/italy\.php\b
drugs\.lowestprices\.at # almost fully superseded by \blowestprices\.at\b

-- seth 12:13, 5 September 2008 (UTC), -- 22:25, 7 September 2008 (UTC)

Just noticed that we are blocking only


when we might as well block the whole thing:


 — Mike.lifeguard | @en.wb 01:44, 3 September 2008 (UTC)

Please do, yes. We're all tired of seeing it in edit summaries by now ... - Alison 05:49, 8 September 2008 (UTC)
I was thinking more along the lines of "Is there some technical reason for using these regexes instead of that one?" but {{done}} just the same!  — Mike.lifeguard | talk 11:45, 8 September 2008 (UTC)
I can't find the 3 regexes I listed above - perhaps my browser isn't loading the whole blacklist? 0.o
Can someone else double-check me?  — Mike.lifeguard | talk 11:55, 8 September 2008 (UTC)
It got removed recently in this diff. Hold on - I'll bop it back in now :) - Alison 12:21, 8 September 2008 (UTC)
And Yes check.svg Done - it's kinda already been logged, so I left it alone - Alison 12:27, 8 September 2008 (UTC)
Bah, I just wanted to make sure the other ones were gone. So we're good now.  — Mike.lifeguard | talk 12:29, 8 September 2008 (UTC)
I recently deleted the entries
for they were totally redundant, because of \.on\.nimp\.org. (And I didn't see your request.)
Blocking the whole domain is even more restrictive, but I don't see any good pages there, so I guess the changing to nimp\.org is ok. -- seth 12:28, 8 September 2008 (UTC)

Bad backlog on MediaWiki talk:Spam-whitelist

Please pitch in and help whittle this down. We have editors who've been waiting several months.

Thanks, --A. B. (talk) 15:14, 8 September 2008 (UTC)

I see three requests, all three answered, and none of them suitable for whitelisting (as it needs whitelisting on the local projects, if anywhere). Are you sure you link to the right page? --Dirk Beetstra T C (en: U, T) 15:18, 8 September 2008 (UTC)
I think he meant w:en:MediaWiki talk:Spam-whitelist.  — Mike.lifeguard | talk 16:24, 8 September 2008 (UTC)
Ooops -- I meant to post this at en:MediaWiki talk:Spam-blacklist. --A. B. (talk) 18:42, 8 September 2008 (UTC)

tinyurl in edit summaries

This shouldn't be possible:


 — Mike.lifeguard | talk 01:24, 15 September 2008 (UTC)

OK, this is because they are moves, not edits. Once log entries are caught this won't happen.  — Mike.lifeguard | talk 01:43, 15 September 2008 (UTC)

User: namespace abuse

Symbol comment vote.svg This section is for archiving User: namespace abuse.


Jewellery sales. Page & images on Commons, user page ad on en wp. --Herby talk thyme 18:22, 12 August 2008 (UTC)


"Talent Lab" recruitment page on Commons. --Herby talk thyme 18:24, 12 August 2008 (UTC)


The following discussion is closed.

Using the User: namespace to promote the domain:

JonAwbrey 20:42, 8 September 2008 (UTC)

closed, please don't make bulk requests, there is no crosswiki spam, he either links to his enwiki userpage or has no userpage. Thanks, --birdy geimfyglið (:> )=| 20:51, 8 September 2008 (UTC)

Elonka ‎

The following discussion is closed.

Creates userpages full of external links (and self-promotion references!) on several wikis. -- Thekohser 20:45, 8 September 2008 (UTC)

closed, please don't make bulk requests, there is no crosswiki spam, he either links to his enwiki userpage or has no userpage. Thanks, --birdy geimfyglið (:> )=| 20:55, 8 September 2008 (UTC)


similar pattern, adding a personal link..--Cometstyles 12:01, 12 August 2008 (UTC)
Thanks Comets - Added Added for now. In passing I see no harm in listing such sites as much to send a message to the user that their behaviour may not be appropriate. Not sure about how lasting teh listing should be our logging immediately - thoughts welcome. --Herby talk thyme 12:12, 12 August 2008 (UTC)
Reviewing this it may well be a good faith de user who has just decided to expand there interests (based on SUL info). In which case I suggest serious consideration for de-listing if we are asked. --Herby talk thyme 12:17, 12 August 2008 (UTC)
Hi guys, I don't understand this, why is my personal website Nervenhammer on this blacklist? Fleshgrinder 09:53, 22 August 2008 (UTC)
Adding the link to your userpage on many wikis where you are not a community member is generally frowned upon. I suggest you instead leave a link to your userpage on your home wiki if you need to create a userpage. If you are an established community member, you would be afforded more leeway with respect to user page content. I'm prepared to de-list this on the condition that the link is not added cross-wiki again.  — Mike.lifeguard | @en.wb 14:20, 31 August 2008 (UTC)
Okay, I'm very sorry about that, it was never my intention to start link building for my website - I only wanted to show my person and what I do. It won't happen again. If I'm not really contributing something I don't create a userpage and if, I set a link to the German Wikipedia (where I'm contributing the most). Thank you for the answer and for elucidating me about this issue. It would be nice if you would enlist the URI, because I don't want my URI to be on a blacklist and I'm definitly not going to post the address again. Kindest regards --Fleshgrinder 09:48, 2 September 2008 (UTC)

Removed Removed  — Mike.lifeguard | @en.wb 14:51, 2 September 2008 (UTC)


Cross wiki spam pages. ( is the domain). --Herby talk thyme 12:59, 13 August 2008 (UTC)

Added Added  — Mike.lifeguard | @en.wb 17:18, 1 September 2008 (UTC)


Is most definitely a spammer who creates "SCM declassified" ( link) on his userpage and talkpage so it can't be rollbacked :( ..thsi account did a similar thing (same pattern)--Cometstyles 03:24, 9 September 2008 (UTC)

Despite being blocked on Commons they are still spamming their user page. Protected now but I guess other projects will be affected. Cheers --Herby talk thyme 11:11, 9 September 2008 (UTC)
Worth locking the account?  — Mike.lifeguard | talk 18:18, 9 September 2008 (UTC)
The account is not global ;( I would suggest to add the link to the bl, best regards, --birdy geimfyglið (:> )=| 18:19, 9 September 2008 (UTC)
It is added, but they are spamming it in plaintext. This is why we want to give stewards the ability to forcibly merge accounts. By not unifying, spammers and vandals may continue unless we block them on each wiki individually. I will purge the userpages, but without some mechanism to enforce this, I do not see how we may force them to stop.  — Mike.lifeguard | talk 18:24, 9 September 2008 (UTC)
Sorry, they are spamming a new domain, which I'm Added adding.  — Mike.lifeguard | talk 18:30, 9 September 2008 (UTC)

Domain was

 — Mike.lifeguard | talk 18:33, 9 September 2008 (UTC)

Checkuser on English Wikisource reminded of an earlier account that has done this, Kisspig. Please globally block the account. John Vandenberg 07:17, 10 September 2008 (UTC)

user "Kisspig" is now locked, thanks, --birdy geimfyglið (:> )=| 12:35, 10 September 2008 (UTC)

David Shankbone

The following discussion is closed.

Point was made here on Meta that this user is promoting self across multiple Wikimedia projects. I am tending to agree. Sincerely, -- De728631 10:28, 11 September 2008 (UTC)

This is a known user who is legitimately active on many wikis, hence the userpages. On most I don't see external links at all, nor do I see current cross-wiki self-promotional behaviour. The few links I have found are for attributional purposes, which is legitimate.  — Mike.lifeguard | talk 10:43, 11 September 2008 (UTC)

Jon Awbrey and JonAwbrey

Creates userpages full of external links (and selfpromotion references?) on many wikis. Annabel 19:08, 28 August 2008 (UTC)

Discussion § 1

  • Inserting § break for the sake of my poor old browser. JonAwbrey 16:44, 9 September 2008 (UTC)
I placed the same vita on my user page that I use on all the sites where I contribute work and discuss ideas with other interested parties. This does not constitute SPAM (= "unsolicited mass-mailing or posting") in any technical or COI sense of the word. I would appreciate the two variants of my real name that I use on the Internet and Web not being listed on any kind of badlists. Thank you, Jon Awbrey 19:12, 29 August 2008 (UTC)
While it may not be spam, it would seem to be abuse of WMF wikis & as such unwanted. While community members are given leeway with their userpages, such excessive linking is generally frowned upon. Furthermore, I very much doubt you understand all the languages you have posted this to, nor are you active in those wikis. I invite you to fix the problem before it is done for you. The history at enwiki will be of interest to others reviewing this.  — Mike.lifeguard | @en.wb 19:36, 29 August 2008 (UTC)
I would appreciate it if you could point to the relevant WMF Terms of Service, or even a generally accepted standard of etiquette that would justify your calling this user page vita an "Abuse". I am referring to the one now posted here at Meta, which is a copy of the one deleted by Annabel from my Nederlands User Page. By "generally accepted standard of etiquette" I mean one that you could honestly assure me is followed across the board on all WMF User Pages. In addition, I have never seen any notice of Wikipedias being "Encyclopedias that anyone who is fluent in the local language can edit" — but please let me know if I have missed such a restriction somewhere. Jon Awbrey 20:22, 29 August 2008 (UTC)
You misunderstand me crucially. I do not say you need to be fluent in the languages where you contribute. To claim that would be hypocritical; I edit all WMF wikis. The issue is that:
  1. You are not an established member of the community on any wiki where you have a userpage (so far as I can tell).
  2. Your userpage has an excessive amount of links (indeed, links form the only content, and they appear to be placed for self-promotional purposes). This would perhaps be an issue regardless of the above.
 — Mike.lifeguard | @en.wb 20:31, 29 August 2008 (UTC)

[Undent]: Correct me if I am wrong, but I do not think it is customary for newcomers to any of the many-tongued Wikipædiæ to be subjected to the ordeals of this type of entrance exam with regard to the legitimacy of their participation. However, By FYIing my real name, educational background, and ongoing intellectual interests, I have certainly done more than the avarage Anon IP on that score.

Many people post pics on their user pages as a way of providing a friendly introduction to themselves, their current interests, and their personal histories. My old web vita harks back to a day when I was unsure about the propriety of copying pics, so I used links instead, over the years being forced to replace many of them with WayBak links. You can hardly dream that I am collecting revenue off archival links like that, can you?

If and when you personally discover an interest in some of the Active Suggestions Concerning Intellectual Interchange that I enumerated in my web vita — which was my sole purpose in posting it to my NL User Page — then we may find more interesting things to talk about. In the mean time, I can hardly become an "established member of the community on any wiki", much less learn a few bits of the local colour and language, if some Admin deletes my self-introductory user page and blocks my account after the first few edits, now can I? Jon Awbrey 23:45, 29 August 2008 (UTC)

  • Jon, this same sort of Wikilawyering nonsense is what got you banned from enWP and booted from the mailing list. Obviously your rampant sockpuppetry and disruption ensures you remain banned on enWP. I would be the first to help you if you wanted your massive list of socks associated with some other name, to reduce the impact on you, but I don't see why we should help you to pretend that you are here to do anything other than the usual: self-promotion and idiosyncratic original research. JzG 20:50, 4 September 2008 (UTC)
Still placing pages - en wq in the past few hours. Cheers --Herby talk thyme 08:00, 6 September 2008 (UTC)
This is shameless self-promotion, and I would suggest that someone who has the necessary rights removes the pages from all projects on which he is not an active participant. JzG 11:44, 7 September 2008 (UTC)

So, the following links are the ones being used for vanity spamming here:

Discussion § 2

I would like to add a comment. As long as this page edited by this user multiple times on the English Wikipedia still exists, we look absolutely foolish trying to suppress a passive list of vitae links from a USER page, for heaven's sake. No surprise. Given the opportunity to choose two paths, Wikimedians will select the most backward, stupid-looking one. -- Thekohser 18:04, 8 September 2008 (UTC)
To my knowledge, Elonka has not edited her article in a long time; this was a big issue in her several RfAs and she's been severely criticized for this before. If I'm wrong and there remains an ongoing issue with coi edits, let me know. Thanks, --A. B. (talk) 18:58, 8 September 2008 (UTC)
In what way was that not trolling, Greg? JzG 17:00, 17 September 2008 (UTC)
Given the variety of links, and that several may well have legitimate uses, I'm going to remove the links. Pushing links is inappropriate regardless of the namespace.  — Mike.lifeguard | talk 19:09, 8 September 2008 (UTC)
Well, I was pointedly reverted on English Wikiversity. I did attempt an explanation in irc, but that was equally-pointedly rebuffed. Relevant on-wiki discussion is on English Wikibooks. Perhaps someone else would take that on.  — Mike.lifeguard | talk 00:05, 9 September 2008 (UTC)
Comment: I see JonAwbrey has reverted quite a number of linkremovals on userpages. --Dirk Beetstra T C (en: U, T) 13:57, 9 September 2008 (UTC)
More userpages on enwikquote, eswiki, fiwiki, kowiki, ruwiki on top of the reverts.  — Mike.lifeguard | talk 14:38, 9 September 2008 (UTC)
See also a discussion between Moulton and Jon Awbrey on Wikiversity, which seems to contain a threat.  — Mike.lifeguard | talk 14:24, 9 September 2008 (UTC)

[Undent] Mr. Lifeguard, given your acknowledgement that the material in question is not "SPAM", I think that further discussion on this so-called "spam blacklist" page is no longer relevant. So I would like to request, once again, that you remove the listing of my usual Internet names from this page. Thanks in advance, JonAwbrey 17:03, 9 September 2008 (UTC)

It may or may not be spam. To me it certainly is abuse of the facilities that are enjoyed by users provided by the Foundation. Your contributions to many projects are zero other than your overlinked user page. --Herby talk thyme 18:02, 9 September 2008 (UTC)
Your statements are incorrect. Since you appear genuinely interested, I can give you a list of contributions to several projects that may not show up in your cursory scans. For instance, you are probably missing the contributions that come by way of interwiki translations of articles that I wrote for the English Wikipedia. These contributions are, in my humble opinion quite substantial. Indeed, it was in following the search engine traces of these translations that I was brought to many of those non-anglophone Wikipedias. JonAwbrey 18:26, 9 September 2008 (UTC)
As for the rest, surely you must have some sense of how silly it would sound to say that a person cannot be allowed to contribute unless he or she is already an established contributor? Surely? JonAwbrey 18:26, 9 September 2008 (UTC)
Fortunately you are entitled to your opinion & I to mine. Wikis are about collaborative working with consensus among folk - your view would seem at odds with some others and not to be particularly collaborative in their approach. Personally I'm inclined to consider blacklisting the links as I see the excessive linkage to be outside the scope of most projects.
There is nothing silly about suggesting that someone whose only contribution to a project is a personal page which is out of scope is not effectively contributing. I delete many such pages most days. --Herby talk thyme 18:56, 9 September 2008 (UTC)

Discussion § 3

The page you provide, with all the links, is IMHO mainly there as a linkfarm. It does not tell about you, what expertise you have, no, it only lists external links to your other identities. As such, it is more promotional (especially since all these pages will show up in e.g. Google searches (here). If you translate things to English, then it is not needed to have a userpage on another language, that userpage is only useful if you actually contribute there). As you create the same userpage with all such links everywhere, a single link to one single 'main' userpage would suffice, this serves no purpose and also I regard this as a misuse of facilities provided by the Foundation (except where local encourage such linking, which, if I see it correctly, is only true on Wikiversity). --Dirk Beetstra T C (en: U, T) 09:37, 10 September 2008 (UTC)

Thinking this through I think the idea that this user should be the sole determinant of both their user page content, and what is on this page, is plain wrong. If they insist on having these links on their user pages then I think it completely correct that this section remains here for the community to consider the position, & this may lead to blacklisting of the links. In passing I also note that they are now blocked on en wp having exhausted the community's patience. I suggest that we may need to consider this view on other wikis too. Thanks --Herby talk thyme 11:31, 10 September 2008 (UTC)

[Undent] Let's back up, slow down a bit, and let me see if I can figure out what the handful of people who are commenting on this page are really concerned about. If you want to move the discussion to my meta talk page that might be nice, as I keep getting warning messages from my browser about "non-responsive scripts" on this page that are really bogging down my ability to read and edit it. JonAwbrey 11:44, 10 September 2008 (UTC)

I think the problem is here, that the user made a linkpage in his usernamespace not only where he is active, but spread it on many wikis, I doubt anyone would have said anything if he had it on the 2 or 3 wikis he is active. But adding it to many wikis and doing that only there, sorry, is spam, not the links itselves, but the mass adding.
I hope that clarifies the problem. I regret that the user got blocked on some wikis for that already, I believe he should just replace the userpage on the other wikis with a link to his main userpage, that would be sufficient if anyone really wants to look at his userpage and request unblock on the wikis where he got blocked.
Best regards, --birdy geimfyglið (:> )=| 15:49, 10 September 2008 (UTC)
  • I will have to be doing some other work for a while. I am breaking this into sections for the sake of my browser and so I don't fall too far behind. I'm still not clear why anyone would refer to my standard self-introduction to a language-based or project-based wiki as a "self-promotion" in the COI sense of the word, much less a "linkfarm". I was led to most of those web sites because my name was already mentioned there in connection with some English Wikipedia article or other page that was refernced or translated there. As far as being a "link farm", I just don't get that at all. I refer people to sites and papers that I am currently working on, as do many other people in all of the wikis that I have seen. I was given to understand that WMF uses "no follow" tags, so no bots follows those links. As I have explained a couple of times before, I have used that same vita for many years as a standard self-introduction. Many people illustrate their user pages with MegaByte animations, graphics, and pics — I have always preferred to use simple links to pictures instead, partly for byteage and partly for copyright reasons. That's what the Web is for, remember? Many of these pages and pics are so old now that they can only be found in the WebArchive. I am certainly not getting any promotional considerations for any of them. JonAwbrey 19:24, 10 September 2008 (UTC)
  • Please see the note on my talk page. Both my browsers keep jamming up on this page, so this will have to be my last posting here. JonAwbrey 00:04, 11 September 2008 (UTC)
Given that there is a strong consensus here, I see no need for further discussion in any case. The links will be blacklisted globally if you revert again, so please do not do so.  — Mike.lifeguard | talk 10:46, 11 September 2008 (UTC)
I agree to that, Jon, please You seem to be a reasonable person, don't readd these links on the pages where You are not active, make a link to Your home wiki, or add babel information etc..
The Web is about many things btw. and there are different sites with different purposes, MySpace, Geocities etc. You can use for making a personal webpage. The userpage on WMF-projects not.
I am still hoping this can be solved without needing to blacklist anything and I hope You understand what I wrote yesterday. It is not the links that are the problems itselves, it is the mass adding of them to multiple sites.
Thanks, --birdy geimfyglið (:> )=| 18:33, 11 September 2008 (UTC)
So, one more cleanout and if he reverts on projects where he is not demonstrably active, then they get blacklisted. That seems entirely reasonable to me. More than reasonable given Awbrey's offsite solicitation over this. JzG 17:04, 17 September 2008 (UTC)


Symbol comment vote.svg This section is for archiving Discussions.

The Logs

log system

I would like to consolidate our logs into one system which uses subpages and transclusions to make things easy. Each month would get a subpage, which is then transcluded onto Spam blacklist/Log so they can easily be searched. This would mean merging Nakons "log entries" into the main log, and including the pre-2008 log. This wouldn't require much change in how we log things.

However, I wonder what people think about also logging removals and/or changes to the regexes. Currently, we don't keep track of those in any systematic way, but I think we should. For example, I consolidated a few regexes a while back, and simply made the old log entries match the new regexes, which is rather Orwellian. Similarly, we simply remove log entries when we remove domains - nothing is added to the log, so we cannot track this easily. This idea (changing the way we log things) is likely going to require some discussion; I don't think there should be any problem moving to transcluded subpages immediately.

 — Mike.lifeguard | @en.wb 14:41, 6 August 2008 (UTC)

I'm all for using one system for the logs. I'm not sure about your second idea though. Is the log intended purely to explain the current entries or also former entries and perhaps even edits? Logging removals would be a good idea to see if a domain was once listed, but logging changes seems too bureaucratic. Matching the log entries with the new regexes might be Orwellian, but it's also pragmatic. What are the advantages of logging changes? Could you perhaps give an example of how you suggest to log changes? --Erwin(85) 18:16, 6 August 2008 (UTC)
I should say I mean "Orwellian" without the connotative value. The denotative value is simply that the current method is "changing history" - not in and of itself a bad thing. Indeed, I've had no issues with this, hence the speculative nature of that part of my suggestion.  — Mike.lifeguard | @en.wb 19:48, 6 August 2008 (UTC)
in de:WP:SBL we do log all new entries, removals and changes on black- and whitelists. logging changes can be useful e.g. for retracing old discussions. -- seth 01:35, 7 August 2008 (UTC)
i think, that the transclusions are a good idea to keep the traffic low. is anybody against that?
concerning the logging of removals/modifications: what do you think about a log system like de:Wikipedia:Spam-blacklist/log#Mai_2008? -- seth 12:12, 13 August 2008 (UTC)
It would be quite some work to link the diffs, but I'm not against using it. I guess that means this is a weak support. --Erwin(85) 09:35, 19 August 2008 (UTC)

if everyone else continues ignoring the suggestions till tomorrow, i will start realizing Miks.lifguards idea by creating subpages like

apart from that i'd like to know...

  1. which components/tools are dependent on the sbl-log-syntax/-format?
  2. am i right, that there is no meta-whitelist? will there ever be one?
  3. would it be ok to switch from the old log-syntax to a new one, whithout converting the old log-entries?

-- seth 10:23, 23 August 2008 (UTC)

Please use subpages; I changed your examples above. There is no global whitelist, no. But in the future? Perhaps something to request. I imagine leaving old logs will be fine. Are we sure we want to log changes to regexes? I'm not sure whether that's really necessary. It also raises the already-high bar to contributing in this area. Our procedures are opaque enough as it is - this is one more hoop we are making potential recruits to the anti-spam team jump through.  — Mike.lifeguard | @en.wb 22:12, 23 August 2008 (UTC)
whitelist: i guess, a global whitelist would not be necessary, because blacklist entries usually can be modified by plain regexp-syntax to match all except such a blacklist entry would be
however, there may be cases, where a explicite whitelist entry would be better human-readable.
leaving old logs: if removals shall be logged, how shall they be logged? just by comment?
log changes: the main reasons why i am asking are #double.2Fwrong_entries and #double entries. if it was ok to remove bugs, syntax-optimizations and double entries without logging it, it would be less work for me. ;-) -- seth 22:50, 23 August 2008 (UTC)
I guess so, but if you see something suspicious please check if it can really be removed. Using the new syntax is OK with me. Logging removals like on dewiki looks good. --Erwin(85) 10:19, 24 August 2008 (UTC)

at least the splitting is done. [12] -- seth 09:37, 25 August 2008 (UTC)

Thanks for taking care of the logs; I think that will work much better.
I'm not sure whether I'm happy with having regexes consolidated as you've done. Within each set of additions, one should try to be concise with your regexes, but I don't think merging all the blogspot ones together is necessarily a good idea. This will make future removals more difficult. In case you forget, not all are as proficient with regex as you, myself included!  — Mike.lifeguard | @en.wb 02:14, 29 August 2008 (UTC)
first of all: i did not merge blogspot entries. the very long blogspot line had existed before my "big" edit. ;-)
merging all blogspot-links in one line would probably be not a good idea, because of performance reasons (the extension builds the regexps in 4k-blocks) and because of COIBot, which now allows a maximum line length of 1k chars.
(not to be misunderstood: grouping regexps increases performance, but lines >1k will lead to problems)
i grouped only a few regexps and only if they were "near" together in the SBL and had no different headings. as regexp grouping is used already, i didn't think that would be difficult to read. the largest grouping i did at the beginning and in lines 3300-3500, see [13]. was that too much?
concerning the logging: afaics we all want to log removals, too, don't we? but if i didn't get you wrong, you don't want to change the log-syntax. so i don't understand how you want SBL removals to be logged? :-) -- seth 09:43, 29 August 2008 (UTC)
My mistake on the blogspot one then. I've said nothing about not changing the log format - feel free to do so in order to log both additions and removals - the template you would want to change is {{sbl-log}} and the "snippet" at the top of this page.  — Mike.lifeguard | @en.wb 18:01, 29 August 2008 (UTC)
oh, ok. i missunderstood "I imagine leaving old logs will be fine." -- seth 09:50, 31 August 2008 (UTC)
Afaics sbl-log does not need to be changed. To keep the log syntax somehow downwards compatible, it will suffice to change the syntax like this:
example\.org # name # b+ reason
where "b+" means addition on blacklist, "b-" means removal.
To keep the format more compact we could use the dewiki-style
example\.org # [SBL-diff b+] # reason
which results in something like
example\.org # b+ # reason
But this is a bit more work for the admins and gives just a small additional information (the exact date of addition/removal), so I don't know whether this is really better. Although Erwin said, the dewiki-syntax looked good and Mike.lifeguard told me to feel free, I'm not sure, if any other admin will beat me, if I change the syntax to dewiki-style. :-) -- seth 11:21, 31 August 2008 (UTC)
However, i've been bold. Now we have same syntax as dewiki. -- seth 15:10, 2 September 2008 (UTC)
I adapted COIBot in the XWiki reports, it now (should) say(s) (have to wait for the next report from nowdiff):
 \bexample\.org             # [SBL-diff b+] # see [[User:COIBot/XWiki/]]
with (hopefully) the first # at position 40 (may have miscalculated that). Replace SBL-diff by the diff and save. It is going to be more work, but well, it is also clearer from now what happens. --Dirk Beetstra T C (en: U, T) 15:33, 2 September 2008 (UTC)
Can we please keep the admin's name (and the span which was there previously)? Furthermore, when someone is using the log snippet at the top of this page, it will follow the old format.  — Mike.lifeguard | @en.wb 16:59, 2 September 2008 (UTC)
I've added a snippet for logging on the actual blacklist. Take the snippet after you make an edit.
So, to log an addition, grab the snippet from this page and the snippet from the blacklist page.
For additions, use {{sbl-log|1161258#{{subst:anchorencode:Example}}}} {{sbl-diff|1161261}}
which produces request addition
For removals, use {{sbl-log|1161258#{{subst:anchorencode:Example}}}} {{sbl-diff|1161261|removal}}
which produces request removal
This should make it faster to log things, I think.  — Mike.lifeguard | @en.wb 17:27, 2 September 2008 (UTC)
OK, changed it back .. we are not sure about this implementation yet (for me, it does give extra work, and IMHO does not add, a simple '+' or '-' in the logs without the actual difflink should suffice .. )? --Dirk Beetstra T C (en: U, T) 17:34, 2 September 2008 (UTC) (forgot to sign)
Actually the admins name is redundant, because it is included in the diff. If all (even redundant) information is provided (like now), it makes a lot of work for the admins.
The additional information provided by the difflink is quite small (i.e. exact modification date). The simple '+'/'-' (or 'b+'/'b-') would be enough. (That's why I was asking a few lines above.) The difflink would be fully superfluous, if all admins used the edit summary line of the sbl to inform about the added/removed entry explicitly, but that it unrealistic, I know.
So which syntax shall be used? Afaics its main features must be: 1. provide important information, 2. easy to input for admins, and 3. not too hard to read for machines. I guess all above suggestions will do, so it doesn't really make a big difference, which one will be chosen.
I guess, if nodody answers, we will just continue like now. -- seth 08:22, 3 September 2008 (UTC)
The admin's name isn't redundant - that is information we will want without having to look at the diff. By that logic, we would also not have whether it was an addition or removal, since that is information contained in the diff 0.o  — Mike.lifeguard | @en.wb 14:20, 3 September 2008 (UTC)
The '+'/'-' is redundant, right. But it is a main information about the sbl modification. The admin's name is imho not so important. But we don't need to discuss about that small point. For me the current syntax is no problem. :-) -- seth 19:12, 3 September 2008 (UTC)

tool for log searching

The simpliest way to improve searchability is to write a tool that searches the logs for you. I'm in the middle of doing so, and I'll have a working prototype in a few days. The way this would work is it would load all the pages (really does not matter where the pages are), and apply a few regex to them. This means we really don't have to merge nacon's stuff, I can just add that page to the tool. As long as the logs keep the same pattern of one entry per line, a tool is not difficult.

I don't really think logging removals is smart, we never remove entries from the logs anyway. Simpliest way is to keep the logs write only (only new entries), and have a tool list all matches. (I'm writing the tool in a manner where you will be able to put the domain in "plain", as in, and it will find all the relevant entries, even if it has \bgoogle\.com\b, or some other weirdness. —— nixeagle 20:23, 6 August 2008 (UTC)

lol, by accident i started writing a similar tool 2 hours ago. but i write a cli-perl-script only. until now it greps all sbl-entries (in meta-blacklist, de-blacklist and de-whitelist), which would match a given url. -- seth 01:35, 7 August 2008 (UTC)
Seth, nixeagle: actually, having a tool that searches all blacklists and logs (i.e. cross-wiki) to see if it is blacklisted somewhere, and if there is a log for that would be great. IMHO, it should be 'easy' to write a tool that extracts all regexes from the page, and tries if it is possitive against a certain url that we search (and it could then be incorporated into the {{linksummary}} to easily find it ..). Or is this just what you guys are working on ;-) .. --Dirk Beetstra T C (en: U, T) 09:50, 7 August 2008 (UTC)