Talk:Spam blacklist: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
→‎isfp.co.uk: Giving it the benefit of the doubt, so I removed it. We can always re-add it.
Line 53: Line 53:
::I reverted the edits as in the end it seemed to be self-promotion. --[[User:Erwin|Erwin]](85) 12:29, 7 August 2008 (UTC)
::I reverted the edits as in the end it seemed to be self-promotion. --[[User:Erwin|Erwin]](85) 12:29, 7 August 2008 (UTC)
:::Erwin: Should we be adding this? &nbsp;'''&mdash;&nbsp;[[User:Mike.lifeguard|<b style="color:#309;">Mike</b>]].[[User talk:Mike.lifeguard|<b style="color:#309;">lifeguard</b>]]'''&nbsp;&#124;&nbsp;<sup>[[b:User talk:Mike.lifeguard|<span style="color:#309;">@en.wb</span>]]</sup> 00:12, 12 August 2008 (UTC)
:::Erwin: Should we be adding this? &nbsp;'''&mdash;&nbsp;[[User:Mike.lifeguard|<b style="color:#309;">Mike</b>]].[[User talk:Mike.lifeguard|<b style="color:#309;">lifeguard</b>]]'''&nbsp;&#124;&nbsp;<sup>[[b:User talk:Mike.lifeguard|<span style="color:#309;">@en.wb</span>]]</sup> 00:12, 12 August 2008 (UTC)
::::Not sure. Giving it the benefit of the doubt, so I removed it. We can always re-add it. --[[User:Erwin|Erwin]](85) 12:55, 12 August 2008 (UTC)


=== nijmegennieuws.nl and doetinchemnieuws.nl ===
=== nijmegennieuws.nl and doetinchemnieuws.nl ===

Revision as of 12:55, 12 August 2008

Shortcut:
WM:SPAM
The associated page is used by the Mediawiki Spam Blacklist extension, and lists strings of text that may not be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any meta administrator can edit the spam blacklist. There is also a more aggressive way to block spamming through direct use of $wgSpamRegex. Only developers can make changes to $wgSpamRegex, and its use is to be avoided whenever possible.

For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.

Please post comments to the appropriate section below: Proposed additions, Proposed removals, or Troubleshooting and problems, read the messageboxes at the top of each section for an explanation. Also, please check back some time after submitting, there could be questions regarding your request. Per-project whitelists are discussed at MediaWiki talk:Spam-whitelist. In addition to that, please sign your posts with ~~~~ after your comment. Other discussions related to this last, but that are not a problem with a particular link please see, Spam blacklist policy discussion.

Completed requests are archived (list, search), additions and removal are logged.

snippet for logging: {{/request|1129623#{{subst:anchorencode:SectionNameHere}}}}

If you cannot find your remark below, please do a search for the URL in question with this Archive Search tool.

Spam that only affects single project should go to that project's local blacklist

Proposed additions

This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived.

qindex.info





In ko.wp: [1][2][3][4] and en.wp:[5][6][7]

It would be better to block this spammer if possible. --ITurtle 09:23, 29 July 2008 (UTC)[reply]

Seems stale to me. Link seems rather widely used, so I'd be reluctant to blacklist in any case.  Declined I suppose.  — Mike.lifeguard | @en.wb 22:59, 11 August 2008 (UTC)[reply]



vimoutiers.net



I came across these IPs (here and here we have the global user contributions) who added this website to several articles on several wikis. It was a long time ago, but most of the links are still there. Using the antispam search, it's found 76 times just in the top 20 wikis, and 96 times in global search. --Gliu 00:06, 3 August 2008 (UTC)[reply]

isfp.co.uk



added by



I already added it to the blacklist, because it's being spammed right now. However, I'm unsure whether or not we should actually blacklist it. I suggest reverting 79.173.65.24's edits and removing it from the blacklist in a day or two. What do others think? I'd be happy to revert it, but since the site seems useful and we're talking about quite a lot of edits, I want someone else's opinion first. --Erwin(85) 11:50, 7 August 2008 (UTC)[reply]

Adding now is - to me - quite correct & thanks. As you say the site seems useful. However it is the site "for" philosophers which I'm not sure I find convincing in terms of being really useful. Other views would be good. --Herby talk thyme 12:19, 7 August 2008 (UTC)[reply]
I reverted the edits as in the end it seemed to be self-promotion. --Erwin(85) 12:29, 7 August 2008 (UTC)[reply]
Erwin: Should we be adding this?  — Mike.lifeguard | @en.wb 00:12, 12 August 2008 (UTC)[reply]
Not sure. Giving it the benefit of the doubt, so I removed it. We can always re-add it. --Erwin(85) 12:55, 12 August 2008 (UTC)[reply]

nijmegennieuws.nl and doetinchemnieuws.nl

Added



and



Spammed by



The bots are down, so this request is for logging only. --Erwin(85) 12:22, 10 August 2008 (UTC)[reply]

Proposed additions (Bot reported)

This section is for websites which have been added to multiple wikis as observed by a bot.

Items there will automatically be archived by the bot when they get stale.

Sysops, please change the LinkStatus template to closed ({{LinkStatus|closed}}) when the report is dealt with, and change to ignore for good links ({{LinkStatus|ignore}}). More information can be found at User:SpamReportBot/cw/about

These are automated reports, please check the records and the link thoroughly, it may be good links! For some more info, see Spam blacklist/help#SpamReportBot_reports

If the report contains links to less than 5 wikis, then only add it when it is really spam. Otherwise just revert the link-additions, and close the report, closed reports will be reopened when spamming continues.

The bot will automagically mark as stale any reports that have less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot. They can be found in this category.

Please place suggestions on the automated reports in the discussion section.

COIBot

Running, will report a certain domain shortly after a link is used more than 2 times by one user on more than 2 wikipedia (technically: when more than 66% of this link has been added by this user, and more than 66% of this link were added XWiki). Same system as SpamReportBot (discussions after the remark "<!-- Please put comments after this remark -->" at the bottom; please close reports when reverted/blacklisted/waiting for more or ignore when good link)

List Last update By Site IP R Last user Last link addition User Link User - Link User - Link - Wikis Link - Wikis
vrsystems.ru 2023-06-27 15:51:16 COIBot 195.24.68.17 192.36.57.94
193.46.56.178
194.71.126.227
93.99.104.93
2070-01-01 05:00:00 4 4

Proposed removals

This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.

Remember to provide the specific domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as {{removed}} or {{declined}} and archived.

See also /recurring requests for repeatedly proposed (and refused) removals.

The addition or removal of a domain from the blacklist is not a vote; please do not bold the first words in statements.



endeav.org

[8] I want to use the link to one page of this site to confirm the writer's membership in this Academy in the new article about one Russian writer. ~ Aleksandrit 12:46, 2 August 2008 (UTC)[reply]

The blacklisting was based on the request here. --Herby talk thyme 12:54, 2 August 2008 (UTC)[reply]
  • Local whitelisting may e appropriate, but the sites were, as I recall, grossly unreliable as sources about anything other than themselves, and as a subject it amounted to a weird pseudoscientific POV-push. JzG 22:27, 2 August 2008 (UTC)[reply]
 Declined per JzG and the original request. You may request whitelisting on local projects if required.  — Mike.lifeguard | @en.wb 14:19, 6 August 2008 (UTC)[reply]

postchronicle.com

Not sure why this is blacklisted (or if it is just wikipedia or the meta filter. This would be useful (as far as I can see), in the w:Bernie Mac article as they have a news blurb refuting rumors of his death. See http://www.postchronicle.com/news/original/article_212161769.shtml (I suppose this will let me know if it is listed at meta or not). Protonk 18:30, 3 August 2008 (UTC)[reply]

Well, guess it is wikipedia's filter.  :) Protonk 18:31, 3 August 2008 (UTC)[reply]
Post Chronicle is problematic, try http://www.bloomberg.com/apps/news?pid=20601103&sid=ajdnnkIECaPE&refer=us instead. Nick 18:44, 3 August 2008 (UTC)[reply]
  • I'm advocating not unlisting as it was spammed by the site owner, who also tried (pretyy cynically) to manipulate OTRS and others to get his site unlisted. Abuse of good faith, sockpuppetry, plus the issues of reliability and copyright, put this firmly in the "more tourble than it's worth" basket for me. JzG 18:53, 5 August 2008 (UTC)[reply]
  • Not done —— nixeagle 13:49, 6 August 2008 (UTC)[reply]

youporn.com

The YouPorn wikipedia article should have a link to youporn.com but this is blocked by the spam filter. --Helohe 14:04, 11 August 2008 (UTC)[reply]

You're right, please request whitelisting for a main-page specific url on the appropriate whitelist page (en:MediaWiki talk:Spam-whitelist). As such  Declined here. Thanks. --Dirk Beetstra T C (en: U, T) 14:05, 11 August 2008 (UTC)[reply]

Troubleshooting and problems

This section is for comments related to problems with the blacklist (such as incorrect syntax or entries not being blocked), or problems saving a page because of a blacklisted link. This is not the section to request that an entry be unlisted (see Proposed removals above).

double/wrong entries

when i deleted some entries from the german sbl, which are already listed in the meta sbl, i saw that there are many double entries in the meta sbl, e.g., search for

top-seo, buy-viagra, powerleveling, cthb, timeyiqi, cnvacation, mendean

and you'll find some of them. if you find it useful, i can try to write a small script (in august), which indicates more entries of this kind.
furthermore i'm wondering about some entries:

  1. "\zoofilia", for "\z" matches the end of a string.
  2. "\.us\.ma([\/\]\b\s]|$)", for ([\/\]\b\s]|$) ist the same as simply \b, isn't it? (back-refs are not of interest here)
  3. "1001nights\.net\free-porn", for \f matches a formfeed, i.e., never
  4. "\bweb\.archive\.org\[^ \]\{0,50\}", for that seems to be BRE, but php uses ERE, so i guess, this will never match
  5. "\btranslatedarticles\].com", for \] matches a ']', so will probably never match.

before i go on, i want to know, if you are interested in this information or not. :-) -- seth 22:23, 12 July 2008 (UTC)[reply]

You know, we could use someone like you to clean up the blacklist... :D Kylu 01:53, 13 July 2008 (UTC)[reply]
We are indeed interested in such issues - I will hopefully fix these ones now; keep 'em coming!  — Mike.lifeguard | @en.wb 01:59, 13 July 2008 (UTC)[reply]
Some of the dupes will be left for clarity's sake. When regexes are part of the same request they can be safely consolidated (I do this whenever I find them), but when they are not, it would be confusing to do so, in many cases. Perhaps merging regexes in a way that is sure to be clear in the future is something worth discussing, but I can think of no good way of doing so.  — Mike.lifeguard | @en.wb 02:06, 13 July 2008 (UTC)[reply]
in de-SBL we try to cope with that only in our log-file [9]. there one can find all necessary information about every white-, de-white-, black- and de-blacklisting. the sbl itself is just a regexp-speed-optimized list for the extension without any claim of being chronologically arranged.
i guess, that the size of the blacklist will remain increasing in future, so a speed-optimazation perhaps will be necessary in future. btw. has anyone ever made any benchmarks of this extension? i merely know that once there had been implemented a buffering.
oh, and if one wants to correct further regexps: just search by regexps (e.g. by vim) for /\\[^.b\/+?]/ manually and delete needless backslashes, e.g. \- \~ \= \:. apart from that the brackets in single-char-classes like [\w] are needless too. "\s" will never match. -- seth 11:36, 13 July 2008 (UTC)[reply]
fine-tuning: [1234] is much faster in processing than (1|2|3|4); and (?:foo|bar|baz) is faster than (foo|bar|baz). -- seth 18:21, 13 July 2008 (UTC)[reply]
I benchmarked it, (a|b|c) and [abc] had difference performance. Same with the latter case — VasilievV 2 21:02, 14 July 2008 (UTC)[reply]
So should we be making those changes? (ie was it of net benefit to performance?)  — Mike.lifeguard | @en.wb 21:56, 15 July 2008 (UTC)[reply]
these differences result from the regexp-implementation. but what i ment with benchmarking is the following: how much does the length of the blacklist cost (measured in time)? i don't know, how fast the wp-servers are. however, i benchmarked it now on my present but old computer (about 300-500MHz):
if i have one simple url like http://www.example.org/ and let the ~6400 entries of the present meta-blacklist match against this url, it takes about 0,15 seconds till all regexps are done. and i measured really only the pure matching:
// reduced part of SpamBlacklist_body.php
foreach($blacklists as $regex){
  $check = preg_match($regex, $links, $matches);
  if($check){
    $retVal = 1;
    break;
  }
}
so i suppose, that it would not be a bad idea to care about speed, i.e. replace unnecessary patterns by faster patterns and remove double entries. ;-)
if you want me to, i can help with that, but soonest in august.
well, the replacement is done quickly, if one of you uses vim
the replacement of (.|...) by [...] can be done manually, because there are just 6 occurrences. the replacement of (...) by (?:...) can be done afterwards by
:%s/^\([^#]*\)\(\\\)\@<!(\(?\)\@!/\1(?:/gc
-- seth 23:26, 15 July 2008 (UTC)[reply]
some explicit further bugs:
\mysergeybrin\.com -> \m does not exist
\hd-dvd-key\.com -> \h does not exist
however, because nobody answered (or read?) my last comment... would it be useful to give me temporarily the rights to do the modifications by myself? -- seth 01:44, 7 August 2008 (UTC)[reply]
I fixed these. You can always request (temporary) sysop status. Any help is appreciated. --Erwin(85) 12:45, 7 August 2008 (UTC)[reply]

User: namespace abuse

Several from Commons



Blacklisted on SWMTBot, and only 2 projects at this time (blocked on all).  — Mike.lifeguard | @en.wb 22:40, 11 August 2008 (UTC)[reply]


Blacklisted; only 3 projects at this time (blocked on all).  — Mike.lifeguard | @en.wb 22:40, 11 August 2008 (UTC)[reply]


Blacklisted; only 1 project & blocked.  — Mike.lifeguard | @en.wb 22:40, 11 August 2008 (UTC)[reply]


Blacklisted; only 1 project & not blocked (Commons).  — Mike.lifeguard | @en.wb 22:40, 11 August 2008 (UTC)[reply]


Blacklisted; only 2 projects & not blocked.  — Mike.lifeguard | @en.wb 22:40, 11 August 2008 (UTC)[reply]


Blacklisted; only Commons & not blocked.  — Mike.lifeguard | @en.wb 22:40, 11 August 2008 (UTC)[reply]


Blacklisted; only Commons & not blocked.  — Mike.lifeguard | @en.wb 22:40, 11 August 2008 (UTC)[reply]


Blacklisted; only 2 projects & not blocked.  — Mike.lifeguard | @en.wb 22:40, 11 August 2008 (UTC)[reply]


Blacklisted; only 2 projects & not blocked.  — Mike.lifeguard | @en.wb 22:40, 11 August 2008 (UTC)[reply]
Grabbed from Herby's note elsewhere.  — Mike.lifeguard | @en.wb 22:01, 11 August 2008 (UTC)[reply]

User:Restaurant-lumiere



per Herby.  — Mike.lifeguard | @en.wb 22:13, 11 August 2008 (UTC)[reply]

Americarx



 — Mike.lifeguard | @en.wb 22:13, 11 August 2008 (UTC)[reply]

Blocked on 3 wikis already - one to watch, I think.  — Mike.lifeguard | @en.wb 22:52, 11 August 2008 (UTC)[reply]

Tbraustralia



 — Mike.lifeguard | @en.wb 22:14, 11 August 2008 (UTC)[reply]

Housingyou



 — Mike.lifeguard | @en.wb 22:14, 11 August 2008 (UTC)[reply]

Thebalfourgroup



Commons & enwiki so far - not blocked yet.  — Mike.lifeguard | @en.wb 22:45, 11 August 2008 (UTC)[reply]
Deleted & warned on Commons.  — Mike.lifeguard | @en.wb 00:14, 12 August 2008 (UTC)[reply]

hkcbn.org



Not sure if it is crosswiki, leaving it here but going to bed now, best regards, --birdy geimfyglið (:> )=| 02:54, 12 August 2008 (UTC)[reply]
It was (& some cross wiki blocks need placing). Added Added by User:Kylu thanks --Herby talk thyme 06:52, 12 August 2008 (UTC)[reply]
Hit en.wb (thanks kylu): deleted & blocked. Some organization on cross-wiki blocks is needed.  — Mike.lifeguard | @en.wb 12:44, 12 August 2008 (UTC)[reply]
And wouldn't global sysop be good for just this stuff........:( --Herby talk thyme 12:50, 12 August 2008 (UTC)[reply]
Hm, Kylu marked them for deletion, all those wikis have local active crats, they have to clean themselves, best regards, --birdy geimfyglið (:> )=| 12:52, 12 August 2008 (UTC)[reply]

Legal Investigation Agency, Dietz& Associates, LLC



Spam page on Commons. --Herby talk thyme 07:09, 12 August 2008 (UTC)[reply]

Jerald Franklin Archer



Violin lessons page on Commons & en wp. --Herby talk thyme 07:09, 12 August 2008 (UTC)[reply]

fabianswebworld.fa.funpic.de fschneider.de.vu



Next one :( --birdy geimfyglið (:> )=| 10:57, 12 August 2008 (UTC)[reply]

Looks quite cross-wiki to mee, I Added Added it already, --birdy geimfyglið (:> )=| 11:04, 12 August 2008 (UTC)[reply]
Thanks birdy - links all cleared. --Herby talk thyme 11:55, 12 August 2008 (UTC)[reply]

Nervenhammer



similar pattern, adding a personal link..--Cometstyles 12:01, 12 August 2008 (UTC)[reply]
Thanks Comets - Added Added for now. In passing I see no harm in listing such sites as much to send a message to the user that their behaviour may not be appropriate. Not sure about how lasting teh listing should be our logging immediately - thoughts welcome. --Herby talk thyme 12:12, 12 August 2008 (UTC)[reply]
Reviewing this it may well be a good faith de user who has just decided to expand there interests (based on SUL info). In which case I suggest serious consideration for de-listing if we are asked. --Herby talk thyme 12:17, 12 August 2008 (UTC)[reply]

Discussion

Unified login abuse

It was only a matter of time until someone figured out that they could use their unified login to create over 150 wikipedia and over 50 wiktionary user pages with links to his own web site. While generally any user is free to do with their user page(s) as they please and the blacklists normally only deal with content pages, this is still the best place I could find to ask for educated opinions about the issue.

The user in question has registered earlier this month and did a handful of simple but seemingly useful edits on enwiki and frwiki (he was more productive on wiktionary since mid june). And at one point he seems to have gotten the idea to "update all Wikis" with his links. I guess that he'll never do anything productive on the majority of those projects (as he doesn't understand the languages), so that raises the question whether such behaviour is acceptable. What do the experts say? --Latebird 23:21, 27 July 2008 (UTC)[reply]

NB: I have added \bhomonyme(\.eu|s\.fr)\b to stop luxo:JackPotte - not 100% sure whether this should stay or not.  — Mike.lifeguard | @en.wb 23:41, 27 July 2008 (UTC)[reply]
I would ask for his site to be blacklisted from now on and not just for a while, when he started doing this over 3 weeks ago, I realised his intentions, and I did lock his account but that didn't stop him, since he added the links with his ip instead so locking him wouldn't have helped but I actually see this as abuse and something we need to look closely into and I'm not talking about that person only but "promoting someone's website on their userpage on multiple wikis if not all wikis", this probably needs more discussion....--Cometstyles 23:59, 27 July 2008 (UTC)[reply]
Ah, so I'm actually late to the party... Good to see this looked into. Apparently his block has expired by now, because the entry that caught my attention was't added by an IP. FWIW (as a relatively rare guest to meta), I'd suggest blacklisting such links just the same as those spammed to articles. --Latebird 06:21, 28 July 2008 (UTC)[reply]
I think this should be a permanent listing. They are abusing Foundation wikis with this link placement. --Herby talk thyme 06:39, 28 July 2008 (UTC)[reply]
Agree with herby, it doesn't matter if an anon adds the link or a registered account, its still regarded as cross-wiki spamming, as the user has made valuable contribs to a few wikis, we do not believe him to be a spammer but what he did is an abuse in all context, and I also agree with herby and would actually believe blacklisting that site permanently is the only option..--Cometstyles 09:50, 28 July 2008 (UTC)[reply]
Was actually unaware of the history of this particular user. Consider it Added permanent  — Mike.lifeguard | @en.wb 09:58, 28 July 2008 (UTC)[reply]
This guy has gotten everywhere. I've just cleaned the "a" ones only so far. grrrrrr --Herby talk thyme 10:32, 28 July 2008 (UTC)[reply]








(adding linksummary and usersummary to get COIBot to link reports here (if ever needed since it is now blacklisted)). --Dirk Beetstra T C (en: U, T) 10:42, 28 July 2008 (UTC)[reply]

By way of an update this is now the approach I guess. --Herby talk thyme 15:52, 30 July 2008 (UTC)[reply]
Well ...


(unless ..). --Dirk Beetstra T C (en: U, T) 16:03, 30 July 2008 (UTC)[reply]

I'm removing all the links I can find. I think the userpages should be deleted, but I haven't time currently to tag the ones that haven't been done already. Perhaps someone could take that on?  — Mike.lifeguard | @en.wb 16:28, 30 July 2008 (UTC)[reply]
Hello it's JackPotte 13:16, 3 August 2008 (UTC) who comes to give my point of humble view of this enigma. First I pray you to excuse my previous behaviour, and I sincerly hope that you wouldn't have lost your useful time by increasing Wikis security. Concerning my special example, which was to diffuse the link of the COPYLEFTED LONGUEST LIST OF THE WORLD of international homonyms : false friends in all languages (that's the reason why I've lost my last sunday by adding it all over the world). Now I just invite you to unblacklist the 2 domaines (Spam_blacklist \bhomonyme(\.eu|s\.fr)), and to read the site for your personnal communication culture (in fact to my mind it would be teached at school). Sorry again, and it might be the occasion to propose to Internautes the possibility to publish a new free important message, after election to all Wikis. Thank you for your services.[reply]

Great - now this. You are doing a good job of irritating us and wasting our time Jack. --Herby talk thyme 18:20, 6 August 2008 (UTC)[reply]

There's a mirror at homonym-list.com, which was just added to fr:Homonymie (and possibly elsewhere). It seems really hard to educate some people... As an off-topic aside, I just for the first time actually looked at that page, and my eyes still hurt. That thing is entirely unreadable! --Latebird 18:04, 9 August 2008 (UTC)[reply]
He's updating user pages with a link to the French article again, see [10]. Blacklist now or shall we wait and see what he's up to now? --Erwin(85) 18:26, 9 August 2008 (UTC)[reply]
Blacklisted


and will now revert what I can.  — Mike.lifeguard | @en.wb 18:32, 9 August 2008 (UTC)[reply]
Hi, sorry again, I couldn't imagine that Wikipedia would suppress its only link towards the longuest homonyms & false friends list of the world (today 10 000 entries), without any constructive message on my profile... I've already read some positive returns from French administrators, who've suggested to create a bot in order to import those kinds of useful lists. JackPotte 07:17, 10 August 2008 (UTC)[reply]
This is abuse of Foundations wikis plain & simple. You were aware that the links were being removed & blacklisted and yet you continued to place new ones. Personally I feel that you should be blocked on all wikis. --Herby talk thyme 09:50, 10 August 2008 (UTC)[reply]
The explanation is simple, after I saw here that it was uncorrect to reference our own useful voluntary copylefted sites through our profiles, I've nevertheless let a single direct link on the more adequate article. Now I've understood and apologize for this 100% good-willing but blind action, and if I have the time in the future (it might be the case), I will propose some bots which would do these jobs (block all external links & synchronise validated lists from other sites). JackPotte 18:04, 10 August 2008 (UTC)[reply]

User page linkage

yep, I've known for sometime now, but since there was a disagreement with locking it and blacklisting then was a bad idea since it was an active editor on several wiktionaries but as I mentioned above, I believe we need a global site blacklisting policy on meta which will apply to all wikis that do not have their own policies on what can be considered as "spam" or and what a user can or cannot add to their userspace in terms of external linking websites, food for thought anyone? :) ...--Cometstyles 10:51, 28 July 2008 (UTC)[reply]

I think this is worth a discussion (maybe not just here either). Over time it has amazed me how tolerant some places are to blatant advertising never mind mere link placement on user pages.
En wp is very odd. A while back I deleted a series of "user pages" with property rental advertising on that had been there a while & no one seemed bothered.
Now and particularly on Commons I tend to base my views on the users contributions. If they create a user page with links and add nothing to the project I tend to delete (& obviously if the behaviour is cross wiki). However if they are contributing to the project then I guess I can live with one or two "favourite" links of their.
It would be good to have other views. Cheers --Herby talk thyme 11:16, 28 July 2008 (UTC)[reply]
Just for the record, on the wikies where I am admin, I've blocked the user and deleted the spam/advertising user page. --M/ 11:23, 28 July 2008 (UTC)[reply]
I was about to hit the block button on Commons but there was no user page & three valid looking uploads. It is a good example of the sort issues that maybe needs thought/discussion I guess. --Herby talk thyme 11:31, 28 July 2008 (UTC)[reply]
The user has infact written to me: he asks for unblock and de-listing explaining that his project is non-commercial. I do not doubt about this, but I also have a question: "What if every single owner that run a non commercial site, has this kind of idea?" --M/ 12:10, 28 July 2008 (UTC)[reply]
You may send them here or to info-en-l@wikimedia.org to discuss this. However I don't see that a request will be successful; I do welcome discussion with this user. Whether the domain is commercial or not is mostly irrelevant - we are concerned with the behaviour here.  — Mike.lifeguard | @en.wb 12:24, 28 July 2008 (UTC)[reply]
It is disturbing, those pages should IMHO be deleted, even if the local policies are not stricly against it. I am a bit afraid to stuff beans here, but that type of behaviour can very easily be used (transclusion) to circumvent detection by the linkwatchers (which have already enough work with mainspace only).
Sites don't have to be commercial to be bad, it is the intention of linking to it. Links to commercial sites can be very viable, while links to non-profit organisations can be added in order to promote the organisation, or even to raise money for your (noble!) plan to build schools in some remote area in far-far-away. --Dirk Beetstra T C (en: U, T) 12:31, 28 July 2008 (UTC)[reply]
(ec) Agree with Mike. This is not about the site content, it is about using (& abusing) Foundation wikis to promote a website that the user owns (on en wp would violate w:WP:COI anyway). --Herby talk thyme 12:33, 28 July 2008 (UTC)[reply]
I agree with leaving this links blacklisted, this obvoiously is abuse, he had been warned before to stop these doings serveral times, but went on, his contributions were observed via the channel, since he had not stopped but seemed to spam even more heavier the decision to blacklist was absolutely right. If he just had added his homepage to his userpage at his homewiki no one would have said anything. Best regards, --birdy geimfyglið (:> )=| 15:53, 29 July 2008 (UTC)[reply]

Another xwiki user page abuse?

I came across this one today. It seems to have the makings of a non contributor who is creating user pages with personal links on. Any other views? Cheers --Herby talk thyme 07:27, 1 August 2008 (UTC)[reply]

Not good. Don't have time to remove links currently, but certainly worth doing (and probably blacklisting too).  — Mike.lifeguard | @en.wb 12:02, 1 August 2008 (UTC)[reply]
hmm seems problematic, I have removed all of the links and if he re-adds, we may have to blacklist it ..--Cometstyles 12:34, 1 August 2008 (UTC)[reply]
Do I really have to add user space (not the talk) to the spaces to parse for the linkwatchers. It is just a small step for me .. --Beetstra public 21:06, 1 August 2008 (UTC)[reply]
If it is easy to do, by all means! Currently tracking these is very much hit-and-miss. We found JackPotte through the SWMTBots, but that will not always be assured, as they are not designed to watch for this sort of thing.  — Mike.lifeguard | @en.wb 23:43, 1 August 2008 (UTC)[reply]
Yes, good idea. A lot of users have spam in userspace. JzG 22:29, 2 August 2008 (UTC)[reply]

A bit of delay handling this one...







 — Mike.lifeguard | @en.wb 18:48, 9 August 2008 (UTC)[reply]

Added Added both.  — Mike.lifeguard | @en.wb 18:13, 10 August 2008 (UTC)[reply]

NOINDEX

Prior discussion at Talk:Spam_blacklist/Archives/2008/06#Excluding_our_work_from_search_engines, among other places

There is now a magic word __NOINDEX__ which we can use to selectively exclude certain pages from being indexed. I suggest having the bots use this magic word in all reports generated immediately. Whether to have this page and it's archives indexed was a point of contention previously, and deserves further discussion.  — Mike.lifeguard | @en.wb 01:33, 4 August 2008 (UTC)[reply]

Sorry missed this one. I certainly support the "noindex" of the bot pages. They are somewhat speculative. If we could get the page name changed I would be happier about not using the magic word on this but..... --Herby talk thyme 16:09, 6 August 2008 (UTC)[reply]
I have added the keyword to the COIBot generated reports, they should now follow that. --Dirk Beetstra T C (en: U, T) 16:31, 6 August 2008 (UTC)[reply]
My bot is flagged now, so I can start adding it to old reports. I will poke a sysadmin first to see if I really must make ~12000 edits before I start though. It will not be all in one go, and I will not start for a day or two.
Any other thoughts on adding it to this page and/or it's archives?  — Mike.lifeguard | @en.wb 18:11, 9 August 2008 (UTC)[reply]
Already sort-of done with {{linkstatus}}, so the bot probably won't run. I plan to keep the flat though <evil grin>  — Mike.lifeguard | @en.wb 22:56, 11 August 2008 (UTC)[reply]

The Logs

I would like to consolidate our logs into one system which uses subpages and transclusions to make things easy. Each month would get a subpage, which is then transcluded onto Spam blacklist/Log so they can easily be searched. This would mean merging Nakons "log entries" into the main log, and including the pre-2008 log. This wouldn't require much change in how we log things.

However, I wonder what people think about also logging removals and/or changes to the regexes. Currently, we don't keep track of those in any systematic way, but I think we should. For example, I consolidated a few regexes a while back, and simply made the old log entries match the new regexes, which is rather Orwellian. Similarly, we simply remove log entries when we remove domains - nothing is added to the log, so we cannot track this easily. This idea (changing the way we log things) is likely going to require some discussion; I don't think there should be any problem moving to transcluded subpages immediately.

 — Mike.lifeguard | @en.wb 14:41, 6 August 2008 (UTC)[reply]

I'm all for using one system for the logs. I'm not sure about your second idea though. Is the log intended purely to explain the current entries or also former entries and perhaps even edits? Logging removals would be a good idea to see if a domain was once listed, but logging changes seems too bureaucratic. Matching the log entries with the new regexes might be Orwellian, but it's also pragmatic. What are the advantages of logging changes? Could you perhaps give an example of how you suggest to log changes? --Erwin(85) 18:16, 6 August 2008 (UTC)[reply]
I should say I mean "Orwellian" without the connotative value. The denotative value is simply that the current method is "changing history" - not in and of itself a bad thing. Indeed, I've had no issues with this, hence the speculative nature of that part of my suggestion.  — Mike.lifeguard | @en.wb 19:48, 6 August 2008 (UTC)[reply]
in de:WP:SBL we do log all new entries, removals and changes on black- and whitelists. logging changes can be useful e.g. for retracing old discussions. -- seth 01:35, 7 August 2008 (UTC)[reply]

The simpliest way to improve searchability is to write a tool that searches the logs for you. I'm in the middle of doing so, and I'll have a working prototype in a few days. The way this would work is it would load all the pages (really does not matter where the pages are), and apply a few regex to them. This means we really don't have to merge nacon's stuff, I can just add that page to the tool. As long as the logs keep the same pattern of one entry per line, a tool is not difficult.

I don't really think logging removals is smart, we never remove entries from the logs anyway. Simpliest way is to keep the logs write only (only new entries), and have a tool list all matches. (I'm writing the tool in a manner where you will be able to put the domain in "plain", as in google.com, and it will find all the relevant entries, even if it has \bgoogle\.com\b, or some other weirdness. —— nixeagle 20:23, 6 August 2008 (UTC)[reply]

lol, by accident i started writing a similar tool 2 hours ago. but i write a cli-perl-script only. until now it greps all sbl-entries (in meta-blacklist, de-blacklist and de-whitelist), which would match a given url. -- seth 01:35, 7 August 2008 (UTC)[reply]
Seth, nixeagle: actually, having a tool that searches all blacklists and logs (i.e. cross-wiki) to see if it is blacklisted somewhere, and if there is a log for that would be great. IMHO, it should be 'easy' to write a tool that extracts all regexes from the page, and tries if it is possitive against a certain url that we search (and it could then be incorporated into the {{linksummary}} to easily find it ..). Or is this just what you guys are working on ;-) .. --Dirk Beetstra T C (en: U, T) 09:50, 7 August 2008 (UTC)[reply]
beta version. :-) -- seth 14:38, 7 August 2008 (UTC)[reply]
WONDERFUL!
one question, can you make it add 'http://' by itself (as we only put the domain in the linksummary as to prevent the blacklist to block it ..). --Dirk Beetstra T C (en: U, T) 14:48, 7 August 2008 (UTC)[reply]
Thats about what I was writing. I was putting it in the framework of http://toolserver.org/~eagle/spamArchiveSearch.php where the tool retrieves the section/page and links you directly to where the item was mentioned. For logs I was working on displaying the line entry in the log as one of the results, so you would not even have to view the log page. —— nixeagle 15:11, 7 August 2008 (UTC)[reply]
if you want to combine my script with that framework, i can give you the source code. but it is perl-code and it is ugly, about 110 lines. -- seth 17:00, 7 August 2008 (UTC)[reply]

some pages

Suggestion for pages:

Thanks! --Dirk Beetstra T C (en: U, T) 14:48, 7 August 2008 (UTC)[reply]

i had to cope with a bug in en-sbl. but now it seems to work. further suggestions? (the more lists i include, the slower the script will get.)-- seth 16:44, 7 August 2008 (UTC)[reply]
I would suggest to do it progressive, first meta and en blacklist, the rest later (roughly in order of wiki-size), similar to luxo does. --Dirk Beetstra T C (en: U, T) 17:07, 7 August 2008 (UTC)[reply]
i used a hash, and those don't care about the order of declaration. now it should be sorted. -- seth 22:11, 7 August 2008 (UTC)[reply]

User page advertising

Another "thinking aloud" one!

I guess I come across a commercial orietated user page on Commons once a day on average. The past week has bought a "Buying cars" page, an "Insurance sales" page, a "Pool supplies" page as well as blog/software/marketing pages. I do usually run vvv's SUL tool but quite often there is nothing immediatly (the Pool suplies one cropped up on en wp a couple of days after Commons). I know en wp are often reluctant to delete such pages out of hand (which I find incredible).

I think I am probably saying should we open up a section here to allow others to watch/comment/block/delete or whatever across wikis? --Herby talk thyme 09:51, 10 August 2008 (UTC)[reply]

I agree, this is a great idea, as I have also noticed spammers like this go cross-wiki to multiple projects (Wikinews/Commons, etc.) Cirt 11:40, 10 August 2008 (UTC)[reply]
Agree. Others may be interested in watching only that part of our work - perhaps a transcluded subpage so it may be watched separately?  — Mike.lifeguard | @en.wb 14:03, 10 August 2008 (UTC)[reply]
Sounds like the best way to proceed. Cirt 14:09, 10 August 2008 (UTC)[reply]
Thanks so far - good to get other views as well but as an idea of the scale I picked these out from the last few days on Commons (all user names) -
Sungate - design advert
Totalpoolwarehouse - obvious & en wp too
Theamazingsystem - two spamvert pages "The Automated Blogging System is a Powerful SEO Technology"
Adventure show - pdf spam file
Firmefront - fr "Banque, Assurance, Gestion Alternative et Private Equity"
The Car Spy - internet car sales
DownIndustries - clothing sales
Serenityweb1 - Nicaragua tourism & en wp
Macminicover - "Dust Cover or Designer Cover for Apple Mac"
I can't instantly find the insurance sales one & I am sure another user produced a page the same as Theamazingsystem. We could do with working out the best way of presenting the info - whether the standard template is needed or whether just an SUL link would allow us a quick check on cross wiki activity?
It would be good to know if the COI bot excludes User: space and whether that may need rethinking?
Cheers --Herby talk thyme 14:35, 10 August 2008 (UTC)[reply]
So far as I know, it watches only the mainspace. But Beetstra above said this could be changed.  — Mike.lifeguard | @en.wb 14:40, 10 August 2008 (UTC)[reply]
Not sure what else is in the works but I think an SUL link to check activity cross-projects would be sufficient. Anything else would be above and beyond but would also be nice. Cirt 15:06, 10 August 2008 (UTC)[reply]
The standard {{ipsummary}} template is pretty good but (I think) lacks the SUL link which for this kind of stuff would be useful (luxo would be a help tho I guess).
The other thing I guess would be to get agreement to lock the blatantly commercial accounts just so that they do not do a "JackPotte" on us I think. I'll maybe point a couple of people to this section. --Herby talk thyme 16:04, 10 August 2008 (UTC)[reply]
As it happens I was just trying to lock an account that wasn't SUL yet. I think the concept is sound, these accounts prolly should be locked and hidden. Not sure about mechanics of implementation. ++Lar: t/c 18:39, 10 August 2008 (UTC)[reply]
IPs can't have a unified account, so the SUL tool is useless. We have luxo's for that.  — Mike.lifeguard | @en.wb 16:49, 10 August 2008 (UTC)[reply]
Yeah - this type really needs an SUL link I think. And we do nee to look at the best way we can lock overtly commercial accounts I think. --Herby talk thyme 16:51, 10 August 2008 (UTC)[reply]
Today I also saw some spamming, by 3 accounts on Commons:Talk:Main Page, I have to say that I do agree with Herby about this here, really a nice idea on how to stop spamming at least some of it. --Kanonkas 18:29, 10 August 2008 (UTC)[reply]
Good idea, Herby! If you want I can set up a tool similar to SUL:, i.e. list user pages and blocks, for IPs. Of course, other tools are possible as well. --Erwin(85) 19:33, 10 August 2008 (UTC)[reply]

Today :) user:Restaurant-lumiere - restaurant spam - [11]. User page advert, series of images all with plenty of information about the restaurant in the "description". --Herby talk thyme 07:07, 11 August 2008 (UTC)[reply]

Well .. enough is enough then. The linkwatchers are from now on also parsing the user namespace. --Dirk Beetstra T C (en: U, T) 10:23, 11 August 2008 (UTC)[reply]
Bot is adapted for the new task. Had to tweak en:User:XLinkBot for that, but well, do I also have to add the 'Wikipedia:' namespace? --Dirk Beetstra T C (en: U, T) 10:36, 11 August 2008 (UTC)[reply]
Personally I think not but others may vary?
+ User talk:Americarx - online pharmacy ads [12], Commons (images & page) & en wp page (& the en wp one had been there a long time. Caught by Kanonkas so thanks. --Herby talk thyme 10:48, 11 August 2008 (UTC)[reply]
Everything that the linkwatchers parse is now getting into the database, and may trigger the XWiki functionality mechanism. We may get more work from this... some more manpower is still necessery (as there are things that I can autocatch which have been excluded this far ..). --Dirk Beetstra T C (en: U, T) 10:53, 11 August 2008 (UTC)[reply]
So are we going to make a transcluded subpage etc or this will get difficult :)
+ User:Tbraustralia - spam page - "TBR Australia is the parent company to TBR Calculators and Australian Student" - [13]. --Herby talk thyme 11:09, 11 August 2008 (UTC)[reply]
+ user:Housingyou - www.housingyoumakelaars.nl page & image [14]. --Herby talk thyme 17:57, 11 August 2008 (UTC)[reply]