Talk:Spam blacklist
For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.
Please post comments to the appropriate section below: Proposed additions, Proposed removals, or Troubleshooting and problems, read the messageboxes at the top of each section for an explanation. Also, please check back some time after submitting, there could be questions regarding your request. Per-project whitelists are discussed at MediaWiki talk:Spam-whitelist. In addition to that, please sign your posts with ~~~~ after your comment. Other discussions related to this last, but that are not a problem with a particular link please see, Spam blacklist policy discussion.
Completed requests are archived (list, search), additions and removal are logged.
- Information
- List of all projects
- Overviews
- Reports
- Wikimedia Embassy
- Project portals
- Country portals
- Tools
- Spam blacklist
- Title blacklist
- Email blacklist
- Rename blacklist
- Closure of wikis
- Interwiki map
- Requests
- Permissions
- Bot flags
- New languages
- New projects
- Username changes
- Translations
- Speedy deletions
snippet for logging: {{/request|1141934#{{subst:anchorencode:SectionNameHere}}}}
If you cannot find your remark below, please do a search for the URL in question with this Archive Search tool.
Proposed additions
This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived. |
nijmegennieuws.nl and doetinchemnieuws.nl
Added
nijmegennieuws.nl
and
doetinchemnieuws.nl
Spammed by
User:81.207.176.60(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
The bots are down, so this request is for logging only. --Erwin(85) 12:22, 10 August 2008 (UTC)
- What do you mean "for logging only" & what does that have to do with the bots being down? I must be missing something here. — Mike.lifeguard | @en.wb 01:00, 13 August 2008 (UTC)
- The LinkWatchers reported three edits for doetinchemnieuws.nl and then stopped reporting anything on IRC. Some time later I checked this IP's edits using Luxo's tool and noticed he kept on spamming. I added the request here to be able to refer the Spam blacklist/Log to this request, specifically Luxo, as I couldn't refer to XWiki reports. The one about doetinchemnieuws.nl showed three edits and there wasn't any on nijmegennieuws.nl. Does this explain? --Erwin(85) 08:04, 13 August 2008 (UTC)
- OK, yeah. Perhaps I'm a bit off today. — Mike.lifeguard | @en.wb 02:49, 14 August 2008 (UTC)
- The LinkWatchers reported three edits for doetinchemnieuws.nl and then stopped reporting anything on IRC. Some time later I checked this IP's edits using Luxo's tool and noticed he kept on spamming. I added the request here to be able to refer the Spam blacklist/Log to this request, specifically Luxo, as I couldn't refer to XWiki reports. The one about doetinchemnieuws.nl showed three edits and there wasn't any on nijmegennieuws.nl. Does this explain? --Erwin(85) 08:04, 13 August 2008 (UTC)
porno-izlee.com
porno-izlee.com
User:88.228.46.15(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
Just 2 links until now, but I believe he is just beginning to add them, I put him on our bl. Best regards, --birdy geimfyglið (:> )=| ∇ 23:06, 13 August 2008 (UTC)
- This can stay, I think. Logged — Mike.lifeguard | @en.wb 19:45, 15 August 2008 (UTC)
People may want to look into other things on the same server:
- Top 10 domains on server porno-izlee.com (67.159.45.5): deniztube.com (153), mynewhaircut.net (52), ghanaclips.com (13), turkishi.com (7), youtubecity.net (6), DenizTube.com (5), porno-izlee.com (4), redindir.com (4), faveladodarocinha.com (3), 911researchers.com (2)
deniztube.com
youtube53.com
youtubecity.net
turkishi.com
- Top 10 editors who have added deniztube.com: 85.99.214.79 (108), 88.228.40.137 (10), 78.169.38.36 (8), 88.228.18.253 (7), LovelessGent (5), 78.169.46.60 (4), 85.99.215.176 (3), 78.169.48.172 (2), 88.228.36.165 (2), 88.228.37.67 (2).
User:85.99.214.79(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:88.228.40.137(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:78.169.38.36(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:88.228.18.253(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:78.169.46.60(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:85.99.215.176(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:78.169.48.172(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:88.228.37.67(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:88.230.198.108(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:88.228.20.109(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:88.228.25.216(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:LovelessGent
The top one has 108 linkadditions, and a range of wikis.
Seems to lead even further, ghanaclips.com has a different set of IP users (in a 41.210 range and some others), but that seems en only (but where did I see kokoliko.com recently).
User:ebenasare
User:41.210.11.57(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:41.210.13.207(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:41.210.15.194(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
kokoliko.com
clubaphro.com
vibeghana.com
ghanaword.com
....
I need help to prune this out completely. --Dirk Beetstra T C (en: U, T) 17:16, 18 August 2008 (UTC)
qpc.ro
- Spam domain
Note that the specific pages linked are to probable copyright violations (they host copies of movies and TV shows):
qpc.ro
tv.qpc.ro
- Spam account
User:Andreitripon
User:79.118.101.6(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
--A. B. (talk) 18:45, 14 August 2008 (UTC)
internationalbadminton.org
i'm not sure about this one. have a look at this diagnostic page: [1]. have there been comparable cases in past? -- seth 21:27, 17 August 2008 (UTC)
- Thanks seth - to me malware sites are always listable to protect the many wikis who depend on this list. Added, cheers --Herby talk thyme 14:14, 18 August 2008 (UTC)
- i guess, the site is/was hacked temporarily. it is linked many times in :de and :en, probably because its content is useful. and i don't know how long google leaves hacked sites in its abuse-list. -- seth 16:45, 18 August 2008 (UTC)
- I don't see any abuse (according to my database), I suggest that if the problem is gone, it is removed, as blacklisting here does disrupt the pages on-wiki (if someone vandalises the page and removes the link, the edit can not be reverted). (What we would need for these is a regex list of external links which are 'disabled', not 'blacklisted'). --Dirk Beetstra T C (en: U, T) 16:48, 18 August 2008 (UTC)
- I've rem'd it out for now. We have (& should) BL sites that contain exploits but if it is not current that is another matter. Maybe we can get more on the google exploit one? Looked legit warning to me. Cheers --Herby talk thyme 16:52, 18 August 2008 (UTC)
tarkanfunclub.com
tarkanfunclub.com
- Spammers
User:78.189.19.102(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:Tarkanmanager
User:Cubs Fan
User:80.80.208.71(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:TarkanFunClub
Fansite. Spammer's contribs speak for themselves. See also w:WT:WPSPAM#Fanclub spammer (permanent link). MER-C 13:39, 18 August 2008 (UTC)
- Yes - more than a nuisance, Added. Thanks --Herby talk thyme 17:46, 18 August 2008 (UTC)
- More domains spammed
bellisimasexshop.com
cinselticaret.com
- Google Adsense: 5551319961929303
gulyeri.com
- Google Adsense: 2743631921357480
hikayelersex.com
kaliterehberi.com.tr
realistikbebek.com
realistshop.com
sibersonic.com
tarkanfunclub.com
- Google Adsense: 5551319961929303
- Related domain
tatliseslisohbet.com
- Second batch Added --A. B. (talk) 00:35, 19 August 2008 (UTC)
unitursa.com spam
- Spam domains
encostablanca.com
grupoesmeralda.com
unitursa.com
- Related domains
galaxtur.com
galetamar.com
rocaesmeralda.com
imperialpark.org
diamantebeach.com
- Spam account
User:80.35.179.87(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
- Reference
--A. B. (talk) 00:12, 19 August 2008 (UTC)
- Added --A. B. (talk) 00:36, 19 August 2008 (UTC)
firme.rs spam
- Spam domains
firme.rs
dtpwiz.com
Google Adsense ID: 1349757567489797
- Related domain
firme.co.yu
- Spam accounts
User:Sudarevic
User:77.105.28.238(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
- Reference
- en:Wikipedia talk:WikiProject Spam#firme.rs spam (permanent link)
--A. B. (talk) 00:16, 19 August 2008 (UTC)
- Added --A. B. (talk) 00:37, 19 August 2008 (UTC)
onlineseo.info
- Domains
onlineseo.info
fihaa.com
Google Adsense ID: 3239128903599293
- Related domain
cgarab.com
- Accounts
User:41.234.255.131(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:41.234.249.211(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:41.234.252.34(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:41.234.254.194(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:41.234.246.98(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
- Reference
--A. B. (talk) 00:17, 19 August 2008 (UTC)
- Added --A. B. (talk) 00:37, 19 August 2008 (UTC)
mysmp.com
- Domain
mysmp.com
- Google Adsense ID: 0719114306637522
- Accounts
User:216.164.151.129(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:Mysmp
User:69.255.236.89(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
- Reference
- en:MediaWiki talk:Spam-blacklist#mysmp.com (permanent link)
--A. B. (talk) 03:24, 20 August 2008 (UTC)
Rich Media Project
- Domain
rich-media-project.com
- Accounts
User:88.162.31.228(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:Loucasss
User:81.249.98.41
User:84.103.221.143
- References
- en:Wikipedia talk:WikiProject Spam#Rich Media Project (permanent link)
- en:MediaWiki talk:Spam-blacklist#rich-media-project.com (permanent link)
--A. B. (talk) 03:34, 20 August 2008 (UTC)
web-anatomy.com
- Spam accounts
User:91.189.141.202(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
User:79.233.74.68
User:217.145.197.118(IP tools: Google | WHOIS | domaintools | RBL | tools)
(spamhaus | projecthoneypot | malwareurl)
- Spam domain
web-anatomy.com
.
- References
- en:Wikipedia talk:WikiProject Spam#Web-anatomy (permanent link)
- en:MediaWiki talk:Spam-blacklist#web-anatomy.com (permanent link)
--A. B. (talk) 03:52, 20 August 2008 (UTC)
Proposed additions (Bot reported)
This section is for websites which have been added to multiple wikis as observed by a bot.
Items there will automatically be archived by the bot when they get stale. Sysops, please change the LinkStatus template to closed ( These are automated reports, please check the records and the link thoroughly, it may be good links! For some more info, see Spam blacklist/help#SpamReportBot_reports If the report contains links to less than 5 wikis, then only add it when it is really spam. Otherwise just revert the link-additions, and close the report, closed reports will be reopened when spamming continues. The bot will automagically mark as stale any reports that have less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot. They can be found in this category. Please place suggestions on the automated reports in the discussion section. |
Running, will report a certain domain shortly after a link is used more than 2 times by one user on more than 2 wikipedia (technically: when more than 66% of this link has been added by this user, and more than 66% of this link were added XWiki). Same system as SpamReportBot (discussions after the remark "<!-- Please put comments after this remark -->" at the bottom; please close reports when reverted/blacklisted/waiting for more or ignore when good link)
List | Last update | By | Site IP | R | Last user | Last link addition | User | Link | User - Link | User - Link - Wikis | Link - Wikis |
---|---|---|---|---|---|---|---|---|---|---|---|
vrsystems.ru | 2023-06-27 15:51:16 | COIBot | 195.24.68.17 | 192.36.57.94 193.46.56.178 194.71.126.227 93.99.104.93 |
2070-01-01 05:00:00 | 4 | 4 |
Proposed removals
This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.
Remember to provide the specific domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as See also /recurring requests for repeatedly proposed (and refused) removals. The addition or removal of a domain from the blacklist is not a vote; please do not bold the first words in statements. |
youporn.com
The YouPorn wikipedia article should have a link to youporn.com but this is blocked by the spam filter. --Helohe 14:04, 11 August 2008 (UTC)
- You're right, please request whitelisting for a main-page specific url on the appropriate whitelist page (en:MediaWiki talk:Spam-whitelist). As such Declined here. Thanks. --Dirk Beetstra T C (en: U, T) 14:05, 11 August 2008 (UTC)
Lluisllach.pl
lluisllach.pl is a fine site, referring to a Geocities page. No spam, no porn. There are many pages about Lluis Llach, and the link was accepted by the Polish one. Bloking really does not seem necessary. —The preceding unsigned comment was added by 212.39.28.26 (talk • contribs) 12:17, 16 Aug 2008 (UTC)
- The site (as you have spelt it) does not appear to be blacklisted here. Thanks --Herby talk thyme 12:23, 16 August 2008 (UTC)
- the sbl is case-insensitive, the entry is
\blluisllach\.pl\b
- for a given url you can use [2] (beta state) to find the corresponding entries. -- seth 13:53, 16 August 2008 (UTC)
- Thanks seth - that way it is here because of this report. It was reverted, links placed again so listed. Looks valid to me. For anyone who doesn't look at it the appeal is by the Ip that was responsible for the link placement. Cheers --Herby talk thyme 13:55, 16 August 2008 (UTC)
- I think we should decline - geocities pages are of no use to the encyclopaedia. Rather the reverse, in fact. 80.176.82.42 23:07, 16 August 2008 (UTC)
- Declined per Herby and original report. — Mike.lifeguard | @en.wb 23:16, 16 August 2008 (UTC)
Troubleshooting and problems
double/wrong entries
when i deleted some entries from the german sbl, which are already listed in the meta sbl, i saw that there are many double entries in the meta sbl, e.g., search for
- top-seo, buy-viagra, powerleveling, cthb, timeyiqi, cnvacation, mendean
and you'll find some of them. if you find it useful, i can try to write a small script (in august), which indicates more entries of this kind.
furthermore i'm wondering about some entries:
- "\zoofilia", for "\z" matches the end of a string.
- "\.us\.ma([\/\]\b\s]|$)", for ([\/\]\b\s]|$) ist the same as simply \b, isn't it? (back-refs are not of interest here)
- "1001nights\.net\free-porn", for \f matches a formfeed, i.e., never
- "\bweb\.archive\.org\[^ \]\{0,50\}", for that seems to be BRE, but php uses ERE, so i guess, this will never match
- "\btranslatedarticles\].com", for \] matches a ']', so will probably never match.
before i go on, i want to know, if you are interested in this information or not. :-) -- seth 22:23, 12 July 2008 (UTC)
- You know, we could use someone like you to clean up the blacklist... :D Kylu 01:53, 13 July 2008 (UTC)
- We are indeed interested in such issues - I will hopefully fix these ones now; keep 'em coming! — Mike.lifeguard | @en.wb 01:59, 13 July 2008 (UTC)
- Some of the dupes will be left for clarity's sake. When regexes are part of the same request they can be safely consolidated (I do this whenever I find them), but when they are not, it would be confusing to do so, in many cases. Perhaps merging regexes in a way that is sure to be clear in the future is something worth discussing, but I can think of no good way of doing so. — Mike.lifeguard | @en.wb 02:06, 13 July 2008 (UTC)
- in de-SBL we try to cope with that only in our log-file [3]. there one can find all necessary information about every white-, de-white-, black- and de-blacklisting. the sbl itself is just a regexp-speed-optimized list for the extension without any claim of being chronologically arranged.
- i guess, that the size of the blacklist will remain increasing in future, so a speed-optimazation perhaps will be necessary in future. btw. has anyone ever made any benchmarks of this extension? i merely know that once there had been implemented a buffering.
- oh, and if one wants to correct further regexps: just search by regexps (e.g. by vim) for /\\[^.b\/+?]/ manually and delete needless backslashes, e.g. \- \~ \= \:. apart from that the brackets in single-char-classes like [\w] are needless too. "\s" will never match. -- seth 11:36, 13 July 2008 (UTC)
- fine-tuning: [1234] is much faster in processing than (1|2|3|4); and (?:foo|bar|baz) is faster than (foo|bar|baz). -- seth 18:21, 13 July 2008 (UTC)
- I benchmarked it, (a|b|c) and [abc] had difference performance. Same with the latter case — VasilievV 2 21:02, 14 July 2008 (UTC)
- So should we be making those changes? (ie was it of net benefit to performance?) — Mike.lifeguard | @en.wb 21:56, 15 July 2008 (UTC)
- I benchmarked it, (a|b|c) and [abc] had difference performance. Same with the latter case — VasilievV 2 21:02, 14 July 2008 (UTC)
- these differences result from the regexp-implementation. but what i ment with benchmarking is the following: how much does the length of the blacklist cost (measured in time)? i don't know, how fast the wp-servers are. however, i benchmarked it now on my present but old computer (about 300-500MHz):
- if i have one simple url like http://www.example.org/ and let the ~6400 entries of the present meta-blacklist match against this url, it takes about 0,15 seconds till all regexps are done. and i measured really only the pure matching:
// reduced part of SpamBlacklist_body.php foreach($blacklists as $regex){ $check = preg_match($regex, $links, $matches); if($check){ $retVal = 1; break; } }
- so i suppose, that it would not be a bad idea to care about speed, i.e. replace unnecessary patterns by faster patterns and remove double entries. ;-)
- if you want me to, i can help with that, but soonest in august.
- well, the replacement is done quickly, if one of you uses vim
- the replacement of (.|...) by [...] can be done manually, because there are just 6 occurrences. the replacement of (...) by (?:...) can be done afterwards by
:%s/^\([^#]*\)\(\\\)\@<!(\(?\)\@!/\1(?:/gc
- -- seth 23:26, 15 July 2008 (UTC)
- some explicit further bugs:
- \mysergeybrin\.com -> \m does not exist
- \hd-dvd-key\.com -> \h does not exist
- however, because nobody answered (or read?) my last comment... would it be useful to give me temporarily the rights to do the modifications by myself? -- seth 01:44, 7 August 2008 (UTC)
- I fixed these. You can always request (temporary) sysop status. Any help is appreciated. --Erwin(85) 12:45, 7 August 2008 (UTC)
- requested and got it. :-) -- seth 09:18, 13 August 2008 (UTC)
- I fixed these. You can always request (temporary) sysop status. Any help is appreciated. --Erwin(85) 12:45, 7 August 2008 (UTC)
before i start modifying the list, a want to know, whether i should log my changes somewhere. oh, and btw. i suppose that the entry [0-9]+\.[-\w\d]+\.info\/?[-\w\d]+[0-9]+[-\w\d]*\]
is somehow senseless, for it will probably never match. i found the original discussion [4] (the regexp was changed afterwards), but the regexp will not grep the links mentioned there. shall i just delete such an entry or shall a make a new request and try to correct it? -- seth 09:18, 13 August 2008 (UTC)
- It would be nice if you could update the log as well, so we can still find the corresponding log message. Though maybe we should wait and see if anything new comes out of #The Logs. I guess it's best to correct wrong entries or in any case log all those removals. It probably wouldn't hurt if some were removed, but I have no idea how many entries we're talking about. --Erwin(85) 09:31, 13 August 2008 (UTC)
- ok, so i'll wait until the other thread is finished. but i don't think, that a manipulating of the logs is a good idea, because this will make tracing of entry changes difficult.
- i guess, there are less than 10, perhaps even less than 5 useless entries. -- seth 10:29, 13 August 2008 (UTC)
double entries
i wrote a small script to grep most of the double (or multi) entries. the result is presented on User:Lustiger_seth/sbl_double_entries. as you can see, there are many (>250) redundant entries. i guess, we could delete more than 200 entries. -- seth 22:59, 19 August 2008 (UTC)
User: namespace abuse
User:Restaurant-lumiere
User:Restaurant-lumiere
per Herby. — Mike.lifeguard | @en.wb 22:13, 11 August 2008 (UTC)
Americarx
User:Americarx
— Mike.lifeguard | @en.wb 22:13, 11 August 2008 (UTC)
- Blocked on 3 wikis already - one to watch, I think. — Mike.lifeguard | @en.wb 22:52, 11 August 2008 (UTC)
Tbraustralia
User:Tbraustralia
— Mike.lifeguard | @en.wb 22:14, 11 August 2008 (UTC)
Housingyou
User:Housingyou
— Mike.lifeguard | @en.wb 22:14, 11 August 2008 (UTC)
Thebalfourgroup
User:Thebalfourgroup
- Commons & enwiki so far - not blocked yet. — Mike.lifeguard | @en.wb 22:45, 11 August 2008 (UTC)
- Deleted & warned on Commons. — Mike.lifeguard | @en.wb 00:14, 12 August 2008 (UTC)
hkcbn.org
User:Highhi
- Not sure if it is crosswiki, leaving it here but going to bed now, best regards, --birdy geimfyglið (:> )=| ∇ 02:54, 12 August 2008 (UTC)
- It was (& some cross wiki blocks need placing). Added by User:Kylu thanks --Herby talk thyme 06:52, 12 August 2008 (UTC)
- Hit en.wb (thanks kylu): deleted & blocked. Some organization on cross-wiki blocks is needed. — Mike.lifeguard | @en.wb 12:44, 12 August 2008 (UTC)
- And wouldn't global sysop be good for just this stuff........:( --Herby talk thyme 12:50, 12 August 2008 (UTC)
- Hm, Kylu marked them for deletion, all those wikis have local active crats, they have to clean themselves, best regards, --birdy geimfyglið (:> )=| ∇ 12:52, 12 August 2008 (UTC)
- And wouldn't global sysop be good for just this stuff........:( --Herby talk thyme 12:50, 12 August 2008 (UTC)
- Hit en.wb (thanks kylu): deleted & blocked. Some organization on cross-wiki blocks is needed. — Mike.lifeguard | @en.wb 12:44, 12 August 2008 (UTC)
Legal Investigation Agency, Dietz& Associates, LLC
User:Iyad sadi
Spam page on Commons. --Herby talk thyme 07:09, 12 August 2008 (UTC)
Jerald Franklin Archer
User:Jerald Franklin Archer
Violin lessons page on Commons & en wp. --Herby talk thyme 07:09, 12 August 2008 (UTC)
fabianswebworld.fa.funpic.de fschneider.de.vu
User:Kzuse
Next one :( --birdy geimfyglið (:> )=| ∇ 10:57, 12 August 2008 (UTC)
- Looks quite cross-wiki to mee, I Added it already, --birdy geimfyglið (:> )=| ∇ 11:04, 12 August 2008 (UTC)
- Thanks birdy - links all cleared. --Herby talk thyme 11:55, 12 August 2008 (UTC)
Nervenhammer
User:Fleshgrinder
- similar pattern, adding a personal link..--Cometstyles 12:01, 12 August 2008 (UTC)
- Thanks Comets - Added for now. In passing I see no harm in listing such sites as much to send a message to the user that their behaviour may not be appropriate. Not sure about how lasting teh listing should be our logging immediately - thoughts welcome. --Herby talk thyme 12:12, 12 August 2008 (UTC)
- Reviewing this it may well be a good faith de user who has just decided to expand there interests (based on SUL info). In which case I suggest serious consideration for de-listing if we are asked. --Herby talk thyme 12:17, 12 August 2008 (UTC)
MariaTash
User:MariaTash
Jewellery sales. Page & images on Commons, user page ad on en wp. --Herby talk thyme 18:22, 12 August 2008 (UTC)
Daliahilfi
User:Daliahilfi
"Talent Lab" recruitment page on Commons. --Herby talk thyme 18:24, 12 August 2008 (UTC)
Autofinance
User:Autofinance
Cross wiki spam pages. (autofinance-ez.com is the domain). --Herby talk thyme 12:59, 13 August 2008 (UTC)
Bestlyriccollection
User:Bestlyriccollection
What is that? I stumbled into it when 84.109.83.73 was vandalizing through the wikis. Best regards, --birdy geimfyglið (:> )=| ∇ 10:41, 14 August 2008 (UTC)
- Very odd indeed. fr wp didn't like the idea of a "user page for bookmarks". Not sure that it is spam but sure doesn't look like "normal" user pages. Looking some more & other opinions would be good. --Herby talk thyme 11:02, 14 August 2008 (UTC)
- They have a point [5]... I don't understand why he needs that in multiple wikis, I mean, if he (miss)uses his userpage for bookmarks, why on many places, --birdy geimfyglið (:> )=| ∇ 12:25, 14 August 2008 (UTC)
Discussion
Another xwiki user page abuse?
I came across this one today. It seems to have the makings of a non contributor who is creating user pages with personal links on. Any other views? Cheers --Herby talk thyme 07:27, 1 August 2008 (UTC)
- Not good. Don't have time to remove links currently, but certainly worth doing (and probably blacklisting too). — Mike.lifeguard | @en.wb 12:02, 1 August 2008 (UTC)
- hmm seems problematic, I have removed all of the links and if he re-adds, we may have to blacklist it ..--Cometstyles 12:34, 1 August 2008 (UTC)
- Do I really have to add user space (not the talk) to the spaces to parse for the linkwatchers. It is just a small step for me .. --Beetstra public 21:06, 1 August 2008 (UTC)
- If it is easy to do, by all means! Currently tracking these is very much hit-and-miss. We found JackPotte through the SWMTBots, but that will not always be assured, as they are not designed to watch for this sort of thing. — Mike.lifeguard | @en.wb 23:43, 1 August 2008 (UTC)
- Yes, good idea. A lot of users have spam in userspace. JzG 22:29, 2 August 2008 (UTC)
- Do I really have to add user space (not the talk) to the spaces to parse for the linkwatchers. It is just a small step for me .. --Beetstra public 21:06, 1 August 2008 (UTC)
A bit of delay handling this one...
linkedin.com/in/nguyenta
themilli.com/ta
User:Ta.nguyen
— Mike.lifeguard | @en.wb 18:48, 9 August 2008 (UTC)
- Added both. — Mike.lifeguard | @en.wb 18:13, 10 August 2008 (UTC)
NOINDEX
- Prior discussion at Talk:Spam_blacklist/Archives/2008/06#Excluding_our_work_from_search_engines, among other places
There is now a magic word __NOINDEX__ which we can use to selectively exclude certain pages from being indexed. I suggest having the bots use this magic word in all reports generated immediately. Whether to have this page and it's archives indexed was a point of contention previously, and deserves further discussion. — Mike.lifeguard | @en.wb 01:33, 4 August 2008 (UTC)
- Sorry missed this one. I certainly support the "noindex" of the bot pages. They are somewhat speculative. If we could get the page name changed I would be happier about not using the magic word on this but..... --Herby talk thyme 16:09, 6 August 2008 (UTC)
- I have added the keyword to the COIBot generated reports, they should now follow that. --Dirk Beetstra T C (en: U, T) 16:31, 6 August 2008 (UTC)
- My bot is flagged now, so I can start adding it to old reports. I will poke a sysadmin first to see if I really must make ~12000 edits before I start though. It will not be all in one go, and I will not start for a day or two.
- Any other thoughts on adding it to this page and/or it's archives? — Mike.lifeguard | @en.wb 18:11, 9 August 2008 (UTC)
- Already sort-of done with
{{linkstatus}}
, so the bot probably won't run. I plan to keep the flat though <evil grin> — Mike.lifeguard | @en.wb 22:56, 11 August 2008 (UTC)
Renaming the blacklist should be done at some point in the future; we'll have to wait on Brion for that. Until then, I'd like to have this page and it's archives __NOINDEX__ed. Having it indexed causes more issues than it solves & we now have an easy way to remedy the situation. We should review this when the blacklist is renamed. — Mike.lifeguard | @en.wb 02:55, 14 August 2008 (UTC)
The Logs
log system
I would like to consolidate our logs into one system which uses subpages and transclusions to make things easy. Each month would get a subpage, which is then transcluded onto Spam blacklist/Log so they can easily be searched. This would mean merging Nakons "log entries" into the main log, and including the pre-2008 log. This wouldn't require much change in how we log things.
However, I wonder what people think about also logging removals and/or changes to the regexes. Currently, we don't keep track of those in any systematic way, but I think we should. For example, I consolidated a few regexes a while back, and simply made the old log entries match the new regexes, which is rather Orwellian. Similarly, we simply remove log entries when we remove domains - nothing is added to the log, so we cannot track this easily. This idea (changing the way we log things) is likely going to require some discussion; I don't think there should be any problem moving to transcluded subpages immediately.
— Mike.lifeguard | @en.wb 14:41, 6 August 2008 (UTC)
- I'm all for using one system for the logs. I'm not sure about your second idea though. Is the log intended purely to explain the current entries or also former entries and perhaps even edits? Logging removals would be a good idea to see if a domain was once listed, but logging changes seems too bureaucratic. Matching the log entries with the new regexes might be Orwellian, but it's also pragmatic. What are the advantages of logging changes? Could you perhaps give an example of how you suggest to log changes? --Erwin(85) 18:16, 6 August 2008 (UTC)
- I should say I mean "Orwellian" without the connotative value. The denotative value is simply that the current method is "changing history" - not in and of itself a bad thing. Indeed, I've had no issues with this, hence the speculative nature of that part of my suggestion. — Mike.lifeguard | @en.wb 19:48, 6 August 2008 (UTC)
- in de:WP:SBL we do log all new entries, removals and changes on black- and whitelists. logging changes can be useful e.g. for retracing old discussions. -- seth 01:35, 7 August 2008 (UTC)
- i think, that the transclusions are a good idea to keep the traffic low. is anybody against that?
- concerning the logging of removals/modifications: what do you think about a log system like de:Wikipedia:Spam-blacklist/log#Mai_2008? -- seth 12:12, 13 August 2008 (UTC)
- It would be quite some work to link the diffs, but I'm not against using it. I guess that means this is a weak support. --Erwin(85) 09:35, 19 August 2008 (UTC)
tool for log searching
The simpliest way to improve searchability is to write a tool that searches the logs for you. I'm in the middle of doing so, and I'll have a working prototype in a few days. The way this would work is it would load all the pages (really does not matter where the pages are), and apply a few regex to them. This means we really don't have to merge nacon's stuff, I can just add that page to the tool. As long as the logs keep the same pattern of one entry per line, a tool is not difficult.
I don't really think logging removals is smart, we never remove entries from the logs anyway. Simpliest way is to keep the logs write only (only new entries), and have a tool list all matches. (I'm writing the tool in a manner where you will be able to put the domain in "plain", as in google.com, and it will find all the relevant entries, even if it has \bgoogle\.com\b, or some other weirdness. —— nixeagle 20:23, 6 August 2008 (UTC)
- lol, by accident i started writing a similar tool 2 hours ago. but i write a cli-perl-script only. until now it greps all sbl-entries (in meta-blacklist, de-blacklist and de-whitelist), which would match a given url. -- seth 01:35, 7 August 2008 (UTC)
- Seth, nixeagle: actually, having a tool that searches all blacklists and logs (i.e. cross-wiki) to see if it is blacklisted somewhere, and if there is a log for that would be great. IMHO, it should be 'easy' to write a tool that extracts all regexes from the page, and tries if it is possitive against a certain url that we search (and it could then be incorporated into the {{linksummary}} to easily find it ..). Or is this just what you guys are working on ;-) .. --Dirk Beetstra T C (en: U, T) 09:50, 7 August 2008 (UTC)
- WONDERFUL!
- one question, can you make it add 'http://' by itself (as we only put the domain in the linksummary as to prevent the blacklist to block it ..). --Dirk Beetstra T C (en: U, T) 14:48, 7 August 2008 (UTC)
- Thats about what I was writing. I was putting it in the framework of http://toolserver.org/~eagle/spamArchiveSearch.php where the tool retrieves the section/page and links you directly to where the item was mentioned. For logs I was working on displaying the line entry in the log as one of the results, so you would not even have to view the log page. —— nixeagle 15:11, 7 August 2008 (UTC)
- if you want to combine my script with that framework, i can give you the source code. but it is perl-code and it is ugly, about 110 lines. -- seth 17:00, 7 August 2008 (UTC)
some pages
Suggestion for pages:
- Spam blacklist
- Spam_blacklist/Log
- en:MediaWiki:Spam-blacklist
- en:MediaWiki_talk:Spam-blacklist/log
- en:User:XLinkBot/RevertList
- en:User:XLinkBot/RevertList_requests/log
Thanks! --Dirk Beetstra T C (en: U, T) 14:48, 7 August 2008 (UTC)
- i had to cope with a bug in en-sbl. but now it seems to work. further suggestions? (the more lists i include, the slower the script will get.)-- seth 16:44, 7 August 2008 (UTC)
- I would suggest to do it progressive, first meta and en blacklist, the rest later (roughly in order of wiki-size), similar to luxo does. --Dirk Beetstra T C (en: U, T) 17:07, 7 August 2008 (UTC)
- i used a hash, and those don't care about the order of declaration. now it should be sorted. -- seth 22:11, 7 August 2008 (UTC)
User page advertising
Another "thinking aloud" one!
I guess I come across a commercial orietated user page on Commons once a day on average. The past week has bought a "Buying cars" page, an "Insurance sales" page, a "Pool supplies" page as well as blog/software/marketing pages. I do usually run vvv's SUL tool but quite often there is nothing immediatly (the Pool suplies one cropped up on en wp a couple of days after Commons). I know en wp are often reluctant to delete such pages out of hand (which I find incredible).
I think I am probably saying should we open up a section here to allow others to watch/comment/block/delete or whatever across wikis? --Herby talk thyme 09:51, 10 August 2008 (UTC)
- I agree, this is a great idea, as I have also noticed spammers like this go cross-wiki to multiple projects (Wikinews/Commons, etc.) Cirt 11:40, 10 August 2008 (UTC)
- Agree. Others may be interested in watching only that part of our work - perhaps a transcluded subpage so it may be watched separately? — Mike.lifeguard | @en.wb 14:03, 10 August 2008 (UTC)
- Sounds like the best way to proceed. Cirt 14:09, 10 August 2008 (UTC)
- Thanks so far - good to get other views as well but as an idea of the scale I picked these out from the last few days on Commons (all user names) -
- Sungate - design advert
- Totalpoolwarehouse - obvious & en wp too
- Theamazingsystem - two spamvert pages "The Automated Blogging System is a Powerful SEO Technology"
- Adventure show - pdf spam file
- Firmefront - fr "Banque, Assurance, Gestion Alternative et Private Equity"
- The Car Spy - internet car sales
- DownIndustries - clothing sales
- Serenityweb1 - Nicaragua tourism & en wp
- Macminicover - "Dust Cover or Designer Cover for Apple Mac"
- I can't instantly find the insurance sales one & I am sure another user produced a page the same as Theamazingsystem. We could do with working out the best way of presenting the info - whether the standard template is needed or whether just an SUL link would allow us a quick check on cross wiki activity?
- It would be good to know if the COI bot excludes User: space and whether that may need rethinking?
- Cheers --Herby talk thyme 14:35, 10 August 2008 (UTC)
- So far as I know, it watches only the mainspace. But Beetstra above said this could be changed. — Mike.lifeguard | @en.wb 14:40, 10 August 2008 (UTC)
- Not sure what else is in the works but I think an SUL link to check activity cross-projects would be sufficient. Anything else would be above and beyond but would also be nice. Cirt 15:06, 10 August 2008 (UTC)
- So far as I know, it watches only the mainspace. But Beetstra above said this could be changed. — Mike.lifeguard | @en.wb 14:40, 10 August 2008 (UTC)
- Thanks so far - good to get other views as well but as an idea of the scale I picked these out from the last few days on Commons (all user names) -
- The standard {{ipsummary}} template is pretty good but (I think) lacks the SUL link which for this kind of stuff would be useful (luxo would be a help tho I guess).
- The other thing I guess would be to get agreement to lock the blatantly commercial accounts just so that they do not do a "JackPotte" on us I think. I'll maybe point a couple of people to this section. --Herby talk thyme 16:04, 10 August 2008 (UTC)
- As it happens I was just trying to lock an account that wasn't SUL yet. I think the concept is sound, these accounts prolly should be locked and hidden. Not sure about mechanics of implementation. ++Lar: t/c 18:39, 10 August 2008 (UTC)
- IPs can't have a unified account, so the SUL tool is useless. We have luxo's for that. — Mike.lifeguard | @en.wb 16:49, 10 August 2008 (UTC)
- Yeah - this type really needs an SUL link I think. And we do nee to look at the best way we can lock overtly commercial accounts I think. --Herby talk thyme 16:51, 10 August 2008 (UTC)
- Today I also saw some spamming, by 3 accounts on Commons:Talk:Main Page, I have to say that I do agree with Herby about this here, really a nice idea on how to stop spamming at least some of it. --Kanonkas 18:29, 10 August 2008 (UTC)
- Good idea, Herby! If you want I can set up a tool similar to SUL:, i.e. list user pages and blocks, for IPs. Of course, other tools are possible as well. --Erwin(85) 19:33, 10 August 2008 (UTC)
- Yeah - this type really needs an SUL link I think. And we do nee to look at the best way we can lock overtly commercial accounts I think. --Herby talk thyme 16:51, 10 August 2008 (UTC)
Today :) user:Restaurant-lumiere - restaurant spam - [6]. User page advert, series of images all with plenty of information about the restaurant in the "description". --Herby talk thyme 07:07, 11 August 2008 (UTC)
- Well .. enough is enough then. The linkwatchers are from now on also parsing the user namespace. --Dirk Beetstra T C (en: U, T) 10:23, 11 August 2008 (UTC)
- Bot is adapted for the new task. Had to tweak en:User:XLinkBot for that, but well, do I also have to add the 'Wikipedia:' namespace? --Dirk Beetstra T C (en: U, T) 10:36, 11 August 2008 (UTC)
- Personally I think not but others may vary?
- + User talk:Americarx - online pharmacy ads [7], Commons (images & page) & en wp page (& the en wp one had been there a long time. Caught by Kanonkas so thanks. --Herby talk thyme 10:48, 11 August 2008 (UTC)
- Everything that the linkwatchers parse is now getting into the database, and may trigger the XWiki functionality mechanism. We may get more work from this... some more manpower is still necessery (as there are things that I can autocatch which have been excluded this far ..). --Dirk Beetstra T C (en: U, T) 10:53, 11 August 2008 (UTC)
- So are we going to make a transcluded subpage etc or this will get difficult :)
- + User:Tbraustralia - spam page - "TBR Australia is the parent company to TBR Calculators and Australian Student" - [8]. --Herby talk thyme 11:09, 11 August 2008 (UTC)
- + user:Housingyou - www.housingyoumakelaars.nl page & image [9]. --Herby talk thyme 17:57, 11 August 2008 (UTC)
I think others already asked about this, but shouldn't this type of listing of problem cross-project spammers/userpages be moved to a subpage? Cirt 23:06, 12 August 2008 (UTC)
- For logging purposes, I put it on this page. I think that will work fine. — Mike.lifeguard | @en.wb 00:46, 13 August 2008 (UTC)
- For me it would just be easier to find and check users with the SUL tool if it were in some unified location on a subpage, but either way is probably okay. Cirt 02:21, 13 August 2008 (UTC)
Our spam filter is now blocking spam URLs in edit summaries
FYI: our spam filter now appears to block spam addresses in edit summaries even if the domain is not in the page text. I just learned this the hard way. It's probably a response to all the shock site spam recently left in edit summaries by vandals; some will crash browsers. --A. B. (talk) 07:45, 20 August 2008 (UTC)
- it's not a very new feature: see bugzilla:13599. -- seth 07:52, 20 August 2008 (UTC)