Jump to content

Talk:Spam blacklist

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by A. B. (talk | contribs) at 16:38, 23 April 2008 (→‎IMASEO Services (India) spam: done). It may differ significantly from the current version.

Latest comment: 16 years ago by A. B. in topic Proposed additions
Shortcut:
WM:SPAM
The associated page is used by the Mediawiki Spam Blacklist extension, and lists strings of text that may not be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any meta administrator can edit the spam blacklist. There is also a more aggressive way to block spamming through direct use of $wgSpamRegex. Only developers can make changes to $wgSpamRegex, and its use is to be avoided whenever possible.

For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.

Please post comments to the appropriate section below: Proposed additions, Proposed removals, or Troubleshooting and problems, read the messageboxes at the top of each section for an explanation. Also, please check back some time after submitting, there could be questions regarding your request. Per-project whitelists are discussed at MediaWiki talk:Spam-whitelist. In addition to that, please sign your posts with ~~~~ after your comment. Other discussions related to this last, but that are not a problem with a particular link please see, Spam blacklist policy discussion.

Completed requests are archived (list, search), additions and removal are logged.

snippet for logging: {{/request|970627#{{subst:anchorencode:section name here}}}}

If you cannot find your remark below, please do a search for the URL in question with this Archive Search tool.

Spam that only affects single project should go to that project's local blacklist

Proposed additions

This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users. Completed requests will be marked as done or denied and archived.
Domains






Accounts




















  • 85.3.144.76
  • 85.1.111.108
  • 85.1.96.70
  • 85.2.90.218
  • 85.2.121.138
  • 85.1.102.82


Related domain


References

--A. B. (talk) 18:50, 21 April 2008 (UTC)Reply

Apparently Added Added by A. B. --Herby talk thyme 07:58, 22 April 2008 (UTC)Reply

deluxecruises.com

Domains


























Accounts



















References

--A. B. (talk) 00:21, 22 April 2008 (UTC)Reply

Done --A. B. (talk) 00:25, 22 April 2008 (UTC)Reply
I overlooked a number of related domains:
























































































































--A. B. (talk) 01:05, 22 April 2008 (UTC)Reply
Done -- related domains now blacklisted. --A. B. (talk) 01:13, 22 April 2008 (UTC)Reply

Multiple Turkish sites caught by SpamReportBot/cw

Caught by the bot and investigated further by Dirk and Jorunn. I'm consolidating the information here from 3 reports:

This has gone on for two years.

Account

































Domains






















Google Adsense ID: 0125465872104138


Related domains














































--A. B. (talk) 05:18, 20 April 2008 (UTC)Reply

I have closed the SpamReportBot reports, centralised discussion is better for this. --Dirk Beetstra T C (en: U, T) 14:13, 22 April 2008 (UTC)Reply
Thanks. I forgot to do that. --A. B. (talk) 18:01, 22 April 2008 (UTC)Reply

Multiple sites by Mikhailov Kusserow

User:



Links:

















They all reside on different IPs:

I am closing the 8 SpamReportBot reports (to keep that list clear). Discussion here please. --Dirk Beetstra T C (en: U, T) 14:18, 22 April 2008 (UTC)Reply

All in all, the links seem legit, though in some cases whole linkfarms were added in one edit. Mikhailov Kusserow has a userpage on many of the wikis I checked (SUL?). I have asked id:Pengguna:Mikhailov_Kusserow (which appears to be one of the bigger accounts) to help us out here. Awaiting discussion. --Dirk Beetstra T C (en: U, T) 15:03, 22 April 2008 (UTC)Reply

More Pince spam since July 2007

Previous blacklist entry from July 2007


Previous related discussions


Subsequent new spam from related domains











































Accounts used for this new spam


en, eo, fr



since last blacklisting: en, fr (many other Wikipedias before that; see Talk:Spam blacklist/2007-07#lang.arabe.free.fr previous blacklist request)

--A. B. (talk) 02:39, 23 April 2008 (UTC)Reply




(ar, ru)



en, es, fr

--A. B. (talk) 11:48, 23 April 2008 (UTC)Reply


Done --A. B. (talk) 13:10, 23 April 2008 (UTC)Reply












There are several users involved, generally the users who are only active on one wiki seem to revert vandalism (as the bots involved). User on more than one wiki (not implying that they did something wrong):



But the link gets sometimes reverted while the Asikhi was not active with that link on that wiki. Maybe 'older' spammers (pre-database?).

Please provide some discussion, I am closing the reports and will point the discussions here. --Dirk Beetstra T C (en: U, T) 09:11, 23 April 2008 (UTC)Reply

As far as msapubli.com is concerned the link being placed "looks" relevant as it is quite long. However (for me) it redirects to the home page which appears to have no relevance to Wikipedia and contains the wor "affiliated" which makes me wonder. I will look far more closely at the others. This batch concerns me --Herby talk thyme 09:26, 23 April 2008 (UTC)Reply
I have real doubts about the validity of these sites to Wikipedia as a whole --Herby talk thyme 10:53, 23 April 2008 (UTC)Reply

buy-ebook.com →mirror of e-library.net
e-library.us →mirror of e-library.net
artdhtml.com →mirror of e-library.net

See also WikiProject_Spam case

Cross wiki spamming

Thanks, --Hu12 09:41, 23 April 2008 (UTC)Reply


IMASEO Services (India) spam

IMASEO Services: linkspamming Wikipedia since 2005 despite requests, then warnings to stop and, finally, multiple account blocks:


Contact data
IMASEO Services
U 8/4 DLF Phase III
Gurgaon
Phone: +91-124-4152776


Spammed domains




  • Google Adsense 4799094371848660
  • SEO client


  • SEO client


  • SEO client


  • SEO client?


  • Google Adsense ID: 0288878065673786
  • SEO client?



Related domain



Accounts









Reference


Note: do not confuse this SemGuru with the unrelated Polish company, SEMGuru.pl; the Australian company, imaseo.net; or the IMASEO Contest. --A. B. (talk) 16:34, 23 April 2008 (UTC)Reply


Done --A. B. (talk) 16:38, 23 April 2008 (UTC)Reply

Proposed additions (Bot reported)

This section is for websites which have been added to multiple wikis as observed by a bot.

Items there will automatically be archived by the bot when they get stale.

Sysops, please change the LinkStatus template to closed when the report is dealt with. More information can be found at User:SpamReportBot/cw/about

These are automated reports, please check the records and the link thoroughly, it may be good links!

If the report contains links to less than 5 wikis, then only add it when it is really spam. Otherwise just close it, if it gets spammed broader the bot will reopen the report.

Please place suggestions on the automated reports in the discussion section.

List
User:SpamReportBot/cw/nakedafrica.net
User:SpamReportBot/cw/hnl-statistika.com
User:SpamReportBot/cw/therasmus-hellofasite.it
User:SpamReportBot/cw/prolococusanese.interfree.it
User:SpamReportBot/cw/rprece.interfree.it

Proposed removals

This section is for proposing that a website be unlisted; please add new entries at the bottom of the section. Remember to provide the specific URL blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as done or denied and archived. See also /recurring requests for repeatedly proposed (and refused) removals. The addition or removal of a link is not a vote, please do not bold the first words in statements.

members.lycos.co.uk

Hi,
the above-mentionned url was added on April 17 and it doesn't suit me: i need a link to members.lycos.co.uk/sfsk/ (the Manfred Wörner Foundation) in the Macedonia article of the French-speaking Wikipedia (reference). There's also a link in en:Manfred Wörner Foundation. I'm not able to assess the number of links to member.lycos.co.uk through all Wikipediæ but i find the article relevant and not spam. I will enquire as to whether the article could be found on another url. (:Julien:) 08:33, 19 April 2008 (UTC)Reply

Beetstra added it only for /davidbisbal. No ideas why it blocks whole domain. Commented out — VasilievVV 09:45, 19 April 2008 (UTC)Reply
Actually the members.lycos.co.uk url was added by Nakon (in full not only for /davidisbal). (:Julien:) 11:11, 19 April 2008 (UTC)Reply
I've rem'd it out for now. Some of the additions were not well thought out I'm afraid. This needs discussion please - personally I think it probably should be removed for now --Herby talk thyme 11:36, 19 April 2008 (UTC)Reply
I specifically did davidbisbal, the site (davidbisbalbrowser.com, commercial) is a redirect-site to members.lycos.co.uk/davidbisbal. I was thinking that the members site is spam-sensitive, but did not do the domain for that reason. There will be cross-wiki spamming of the domain, but in that case the specific urls need to be blacklisted. If it proofs in the end that it is out of hand, then the domain can be added, where then specific whitelisting can be used for certian sites (for en, most of it will fail en:WP:RS, en:WP:COI, en:WP:NOT, en:WP:EL &c. &c.)
I am removing the # from the davidbisbal rule on members.lycos.co.uk, that should not be the problem. --Dirk Beetstra T C (en: U, T) 19:10, 19 April 2008 (UTC)Reply
Yes I'd already fixed the site wide listing. I will remove members.lycos.co.uk completely from the blacklist in 24 hours if no one objects. Thanks --Herby talk thyme 07:40, 20 April 2008 (UTC)Reply
Hmmm. 99.9% of links to Lycos member pages are entirely inappropriate, in my experience. JzG 06:55, 21 April 2008 (UTC)Reply

lyrikline.org

Not sure why this is blocked - it is a poetry platform which has won the prestigious Grimme Online Award and is run by renowned institutions. The blacklisting has been discussed on the German Wikipedia, but nobody seems to know the reason. It prevents editing of several lyrics-related entries and the list of Grimme Online Award winners. -- Alexander, 13:16, 20 April 2008 (UTC)Reply

Request: old talk:spam blacklist item


I see some IPs in the COIBot report who are cross-wiki spamming:




Resp 108 and 20 records .. Guess that qualifies sufficiently. Is local whitelisting of (certain parts of) the site an option? --Dirk Beetstra T C (en: U, T) 13:31, 20 April 2008 (UTC)Reply
It appears that on German wiki the IP de:user:62.96.74.70 was used by de:User:Lyrik (seen the same posts on the same page).


Could conflict of interest also have been a problem here (I know that the de wikipedia does not have these guidelines, but en does)? --Dirk Beetstra T C (en: U, T) 13:38, 20 April 2008 (UTC)Reply
Declined a few days ago here --Herby talk thyme 13:35, 20 April 2008 (UTC)Reply
 Declined again, for the same reason. I checked some of the links out, it appears to host (in amongst whatever good content it may have) copyright violations of lyrics. Copyright policy is, I think, foundation wide? Certainly copyright law is applicable to all projects. JzG 06:56, 21 April 2008 (UTC)Reply

cybertronchronicle.freewebspace.com

I wrote an article for the German Wikipedia, de:Transformers – Der Kampf um Cybertron, which is the German tithe for en:The Transformers: The Movie. I want to link an interview with the movie's voice director hosted under the domain cybertronchronicle.freewebspace.com (direct link to interview: cybertronchronicle.freewebspace.com/60-astrominutes/interview-wally_burr.html) as a source for a claim. The site actually contains several interviews with voice actors, writers and producers of the original Transformers cartoon. I don't know what went on with the hoster freewebspace.com, but as far as I can tell, the sub-domain cybertronchronicle.freewebspace.com, maintained by Rik Bakke, seems fine, so I propose that an exception be made.--87.164.80.99 18:27, 20 April 2008 (UTC)Reply

I think you should ask for whitelisting of this specific url on the german wikipedia, I guess de:Mediawiki talk:Spam-whitelist. --Dirk Beetstra T C (en: U, T) 19:57, 20 April 2008 (UTC)Reply
 Declined per Beetstra, thanks --Herby talk thyme 12:18, 21 April 2008 (UTC)Reply


radiopapesse.org

I tryied to modify the article about Radio Papesse on the Italian Wikipedia. The system told me that www.radiopapesse.org is in the black list. I think it's very strange because the article just have a formal autorization and for the following, big reasons:

- Radio Papesse it's the only italian webradio based on a public center of contemporary art. Public means public money and public utility. It'as not a private station. - Radio Papesse have no advertising. - All the material we do is under Creative Commons. - It's a quite important project for Art Radio. You can listen an interview to us in the WPS1 MoMa Archive, New York, here http://www.wps1.org/new_site/component/option,com_alphacontent/Itemid,187/section,97/cat,107/sort,15/limit,30/limitstart,30/ - Students from 4 differents university in Italy have written or are writing down their graduation thesis about us.

And last but not least I wanted to modify the article to write down that now Radio Papesse is supported by Amaci, a no-profit association that collect more than 20 museums of contemporary art in Italy. www.amaci.org here the list: http://www.amaci.org/musei_associati.asp

I don't know what we can do more to be out of a blacklist. The preceding unsigned comment was added by 159.213.102.5 (talk • contribs) 11:34, 22 Apr 2008 (UTC)





See the COIBot and SpamReportBot links in that list. Massive volume, cross wiki, maybe it is appropriate on the article on the radio station on the italian wiki, in that case I would like to point you to the whitelist on the italian wikipedia. --Dirk Beetstra T C (en: U, T) 11:38, 22 April 2008 (UTC)Reply
The problem here is the substantial amounts of links that you (or someone using your IP address) created. This has nothing to do with content/licensing or similar. If you look here this is the report we received on your placement of links. It is excessive & so appears that you are using Foundation sites to gain traffic. It may well be that seeking whitelisting on the Italian Wikipedia is the best idea. --Herby talk thyme 11:50, 22 April 2008 (UTC)Reply

Thank you for the answer. I wrote on whitelisting on the Italian Wikipedia. I suggested to evaluate the links (that for us are appropriated and give free quality information to the users) and to whitelist radiopapesse.org.

Thank you; whitelisting at itwiki is probably the best solution here.  Declined – Mike.lifeguard | @en.wb 21:07, 22 April 2008 (UTC)Reply

logisticsclub.com

Dear Wiki and concerned person, the above mentioned url was blocked but I don't know why. This web site belongs to Logistics association in Turkey. Aim of this web site is to band together the logistics sector, university student and demandant sector of logistics and to provide information about logistics and transportation and warehouse, to announce related conferences and seminars. There are three different url for this club logisticsclub.com - logisticsclub.org - lojistikkulubu.com - loj*istikkulubu.org all of this url amount to the same web site, two of them Turkish web address. Consequently this site of Logistics Club is not spam, why all web address of Logistics Club are blocked. Now we want to add a subject "Lojistik Nedir? (in english What is the Logistics)" - www.logisticsclub.com/modules.php?name=News&file=article&sid=2 with web address into References section of http://tr.wikipedia.org/wiki/Lojistik this head. The preceding unsigned comment was added by Farukcaliskan (talk • contribs) 11:58, 22 Apr 2008 (UTC)

See request. --Dirk Beetstra T C (en: U, T) 12:05, 22 April 2008 (UTC)Reply

pl.net

It appears pl.net was blacklisted (see User:SpamReportBot/cw/pl.net) for the sake of a single spammed user page (a /~username/ page). This affects a link that is being used as a reference on WP, and I could ask for a whitelist exception, but the block on this domain seems too sweeping for the amount of damage it was causing. I suggest removing the blacklisting. Kellen T 13:39, 23 April 2008 (UTC)Reply

Indeed, looking at the spam report indicates to me that the user was fleshing out external links on the crosswiki links with some from en:WP, and wasn't spamming at all. Probably most of the domains that got blacklisted as a result of this should be unlisted. Kellen T 13:46, 23 April 2008 (UTC)Reply
Removed Removed, excessive listing for now. I'll check some of the others when I have time --Herby talk thyme 13:49, 23 April 2008 (UTC)Reply
Awesome, thanks Herby! Kellen T 13:52, 23 April 2008 (UTC)Reply

Troubleshooting and problems

This section is for comments related to problems with the blacklist (such as incorrect syntax or entries not being blocked), or problems saving a page because of a blacklisted link. This is not the section to request that an entry be unlisted (see Proposed removals above).

Discussion

Help needed

Dear all. Eagle 101 and I have been working on bots in the spam IRC channels (see #wikipedia-spam-t for talking, people there will be able to steer you to the other channels; #wikipedia-en-spam and #cvn-sw-spam). The bots are now capable of real-time cross wiki spam detection (and soon that will also be reported). It would be nice if some of you would join us there, and help us cleaning etc. as this appears to go faster than we at first expect (and I do get the feeling the en wiki is not a good starting point for finding them! --Beetstra 21:35, 22 March 2008 (UTC)Reply

Something interesting for ya all to look at. I'm going to work on making each link go to subpages, and have them updated in a way that we can comment on the subpages as well, and bring the ones that need blacklisting to the meta blacklist. I can't have the bot automatically post here, we would flood this list out, so we will have to look at them all and then link to them. Hopefully we can get all the reports in one place, the coibot reports etc. Folks more or less simple crosswiki spam is easily detectable. :) —— Eagle101 Need help? 22:55, 22 March 2008 (UTC)Reply
Bah, you probably want to see the subpage at User:SpamReportBot/test ;) —— Eagle101 Need help? 23:00, 22 March 2008 (UTC)Reply

Addition to the COIBot reports

The lower list in the COIBot reports now have after each link four numbers between brackets (e.g. "www.example.com (0, 0, 0, 0)"):

  1. first number, how many links did this user add (is the same after each link)
  2. second number, how many times did this link get added to wikipedia (for as far as the linkwatcher database goes back)
  3. third number, how many times did this user add this link
  4. fourth number, to how many different wikipedia did this user add this link.

If the third number or the fourth number are high with respect to the first or the second, then that means that the user has at least a preference for using that link. Be careful with other statistics from these numbers (e.g. good user do add a lot of links). If there are more statistics that would be useful, please notify me, and I will have a look if I can get the info out of the database and report it. The bots are running on a new database, Eagle 101 is working on transferring the old data into this database so it becomes more reliable.

For those with access to IRC, there this data is available in real time. --Beetstra 10:40, 26 March 2008 (UTC)Reply

Log weirdness

I guess it may be a caching issue but for me the log appears to end at July 2007? Editing gave me March 2008 but it ain't there now for me? --Herby talk thyme 12:16, 26 March 2008 (UTC)Reply

I've rv'd myself for now but something is going wrong??? --Herby talk thyme 14:21, 26 March 2008 (UTC)Reply
Looks to me like you put the log entry in the right section, I'm re-adding it for ya. Did you purge? ~Kylu (u|t) 16:28, 26 March 2008 (UTC)Reply
Agreed in a sense but just purged the cache & it cuts off at July 2007 for me (I even tried making it #March 2008 and got de nada). Is it just me - it has been "one of those" days :) --Herby talk thyme 17:02, 26 March 2008 (UTC)Reply
I don't see past July 2007 either :\ Mønobi 17:11, 26 March 2008 (UTC)Reply
https://wikitech.leuksman.com/view/Server_admin_log#March_26 - issues with the rendering cluster again (which would keep &action=purge from working) ~Kylu (u|t) 17:40, 26 March 2008 (UTC)Reply
Did the full ff purge & still have the same as Monobi today. I am recording the entries that I cannot log at present but I guess if this is not resolved soon alternatives of some sort may be needed. If anyone else finds (or does not find) the same it would be good to hear. Thanks --Herby talk thyme 08:46, 27 March 2008 (UTC)Reply
Leave me the log entries you want added on my talk, and I'll add them for you if you'd like. I can get around this problem. :) ~Kylu (u|t) 14:12, 27 March 2008 (UTC)Reply
Ok, sorry for archiving this. It looks like we hit some sort of limit. My suggestion is to make a second log page for the time being and start logging from that while the original bug is reported to bugzilla. —— nixeagle 02:54, 30 March 2008 (UTC)Reply

Hopefully sorted for now via Spam blacklist/LogPre2008. Of course this is a wiki so if anyone disagrees....:) Cheers --Herby talk thyme 11:38, 30 March 2008 (UTC)Reply

Crosswiki spam detection

Ok folks we can more or less detect any crosswiki spam addition. Wander over to User:SpamReportBot/cw. This is a report of all links added by only a few people across more then 3 wikis. Each section here is its own subpage, which means you can transclude them on this page, link to the specific section, etc. You can also comment on the subpages if you have further notes etc, such as "this is not spam because of X". Depending on what we all think of it, I'll transclude User:SpamReportBot/cw on this page. —— Eagle101 Need help? 00:31, 29 March 2008 (UTC)Reply

I'll also note that it automatically removes old items. Items should stay up for 2-3 days before being removed by the bot. (that is if no more links are added). If good links consistently come up, I'll come up with a whitelist mechanism that we can add links to if we deem the additions ok and we don't want to see the additions there. Please suggest improvements on how the bot reports. —— Eagle101 Need help? 01:30, 29 March 2008 (UTC)Reply
I started to blacklist a number of these and then stopped when I noticed the blacklist log is acting seriously weird. --A. B. (talk) 02:21, 30 March 2008 (UTC)Reply
Alright, thanks for your work. I'm going to continue to work on the bot and the algorithm being used, so noting false hits is important. The major one seems to be knowing accounts that edit a lot. I'll work on a fix to that tomorrow, I'm hitting the sack tonight. —— nixeagle 03:06, 30 March 2008 (UTC)Reply
Once again, Wikipedia is a better quality project because of hardworking and conscientious editors.--Hu12 13:50, 4 April 2008 (UTC)Reply

XRumer spam

Well, anyone who is involved in crosswiki spam, has at some point seen Xrummer (is the best!) spam. Now he hotlinks a thumbnail for his program, as seen on [1]. Code he's using:

X-Rumer is the BEST! 
 
<img>http://upload.wikimedia.org/wikipedia/en/thumb/6/6b/XRumer_screenshot.gif/200px-XRumer_screenshot.gif</img> 

So I added the following line: \bupload\.wikimedia\.org\/.*XRumer_screenshot\.gif\b to blacklist all links to possible thumbnail sizes. although I don't know if I did it properly (and the logging system used here confuses me). So, could anyone here review if I did it properly? es:Drini 19:07, 28 March 2008 (UTC)Reply

That works. I just tried it out. (adding the link that is). —— Eagle101 Need help? 01:25, 29 March 2008 (UTC)Reply
I deleted the pic on enwiki, btw, but am told that it'll be a while before that link is purged. If it's a huge problem, we can request that a shell user delete the file manually, but... ~Kylu (u|t) 22:13, 1 April 2008 (UTC)Reply

SpamReportBot/cw feedback

First item: after a lot of checking, I went through and made comments in each section as to which bot-reported domains needed blacklisting and which looked legit. When I was all done, I saw that none of my edits "stuck" -- it was if I'd never made them.. This must have something to do with the fact that these reports are transcluded. Then I went and blacklisted 13 domains; afterward I saw others had also blacklisted some of the same links, so there was some wasted effort. Conclusion: we very much need a way to mark up these reports so we don't duplicate each others' efforts.

In lieu of marking each report, here's my feedback on some of the domains reported so far:

  • I blacklisted these:
    • tremulous.net.ru
    • logosphera.com
    • vidiac.com
    • yarakweb.com
    • img352.imageshack.us
    • ayvalikda.com
    • sarimsaklida.com
    • worldmapfinder.com
    • cundadan.com
    • bikerosario.com.ar
    • alfpoker.com
    • karvinsko.eu
    • yarak.co.uk
  • Links added to these sites looked legitimate:
    • wikilivres.info
    • unwto.org
    • en.pwa.co.th
    • villatuelda.es
  • Some others still need evaluation

All in all, SpamReportBot/cw looks like a very powerful, useful tool. --A. B. (talk) 03:28, 30 March 2008 (UTC)Reply

OK, I just figured out that if I post my comments in the bot report sections above the line that says "<!-- ENDBOT POST BELOW HERE -->", then they'll show up. I don't know if it's a good idea to do this, however -- will it screw up the bot or the transclusion? --A. B. (talk) 03:36, 30 March 2008 (UTC)Reply
The work of the bot is awesome & deserves both thanks & discussion. There seems a few issues that need addressing such as what to look at, logging etc & it would be good to see discussion here. I feel that there may be a case for listing all bot generated sites because the behaviour is "spammy". However I also think because it is bot generated and there will likely have been no warnings, that entries can & should be removed after some sensible interaction has taken place. I am well aware that others here would not share my views so I will substantially reduce my activity on this page (& Meta).
The bot - while excellent - has generated far more work that I have time for and so I will just look at dealing with the request from the people who make requests here & who I've got to know & trust if I am around. Given the vast number of admins on Meta this should not cause any problems - however Meta seems to attract many people who want be admins but are not inclined to do any of the work. If I am around I'll help but my time is short & there is much to do on Commons. Thanks --Herby talk thyme 12:25, 30 March 2008 (UTC)Reply
You do raise a valid point, as far as no warnings. Thankfully we just turned a major corner. We now have the ability to detect most spammy behavior. However now that detection and reversion is easier (SUL), we may want to evaluate what we do in response to those that add links many times.
When I first started helping in this effort, we were shooting in the dark. There was no COIBot reports, irc feeds, the crosswiki linksearch tool, or any sort of monitoring of more then one wiki at a time... thus detecting spam across multiple wikis was... pardon my language, damned hard! As such we blacklisted all we could find. This type of spam was and is sneaky as it bypasses most community's detection mechanism. Its only one link to folks on the various wikis, but added togather its across 5 or more!
Now that we have a detection mechanism, one that we can adopt should the behavior of spammers change significantly, we need to ask ourselves, should we blacklist with the same vigor? Should we attempt to assume good faith of ones that appear to us to be accidental, or in good faith? How do we go about warning someone that may never see the warning, or be unable to read the language in which the warning is placed in? In addition, we must remain ever wary of en:Joe jobs.
These are questions that need to be answered, and Herbythyme is right on the ball hinting at these here and elsewhere. Its perfectly valid to keep our response the same as it always was, but this may not be the best course of action. I don't know for sure what is. Please discuss your thoughts to this below my comment, or in its own section. :) —— nixeagle 18:20, 30 March 2008 (UTC)Reply
Someone will have to remove the blacklisted links from the wikis. Can that be done by a bot and/or can a bot be set up to give information on the affected wikis about where the blacklisted links are, so the local community can remove the links themselves? Removing spam is a tedious task, and sometimes one feels one is as much infringing with the local communities as is any spammer. If possible the local communities should evaluate the blacklisted links themselves, and remove the ones they don't want, and either strip or whitelist the others. I realize that might not be very realistic. For Commons there is the CommonsTicker and CommonsDelinker. Is it possible to handle the blacklisted links in a similar way? --Jorunn 13:49, 30 March 2008 (UTC)Reply
Possibly it could be done by bot... I can work on writing this if its wanted. SUL will make things much easier. I usually just click the diff links and click undo on each of the ones I blacklist. In otherwords I don't blacklist things I'm not willing to undo the link additions to. —— nixeagle 17:54, 30 March 2008 (UTC)Reply
A.B. - As far as your edits not sticking on the transcluded pages... can you show me an example? I can't fix it unless I can see an example of the problem. :S It will be useful down the road to have the blacklisted or not portion in the page itself, so this should work without any problems... —— nixeagle 18:03, 30 March 2008 (UTC)Reply
Replying to myself again: AB - "<!-- ENDBOT POST BELOW HERE -->", posting above that means the bot will overwrite your comments should there be future link additions from that domain.
Also, A.B. and everyone else interested, I just modified the algorithm to remove 2 out of 4 identified false hits. I'll look at the other two, but I'd like to see this run for a day or so and see what crops up. Please do attempt to comment on the actual sub pages. —— nixeagle 19:31, 30 March 2008 (UTC)Reply

I removed transclusion from this page because it was loading very slowly — VasilievVV 06:22, 13 April 2008 (UTC)Reply

Sure, when and if it gets back to a manageable level, we can place it back on this list. —— nixeagle 19:45, 16 April 2008 (UTC)Reply
I want to add, we also need people on IRC watching our bots. The output of the bots we are running there does show when accounts are actually busy spamming cross-wiki, and much work and damage can be stopped when reacting promptly there. I yesterday added two before they were reported here (closed the reports this morning). Also, when you hit them when they are busy, they notice that what they do is a problem, if you add them the next day, they may never know what happened. --Dirk Beetstra T C (en: U, T) 10:33, 17 April 2008 (UTC)Reply

Clearing the backlog!

OK - we have a choice - drown in it or tackle it! It will not be long before that page will not load never mind anything else.

Assuming drown is not the choice (:)) I think we need to use a larger mesh. These are reports of possible excessive linkage. If we had the time & people we would look in detail at every one with a fine tooth comb - we haven't.

Action plan

  1. I'm going to close all those that have been around for a week or so. The worst that will happen is that they will be re opened again?
  2. I think we need to take the view that we take a quick look at each - if it doesn't look like a threat to the project we close it and move on. One of the issues here is great though the bot is no human has actually checked it so it is far more labour intensive that manual reports.
  3. Recruit - can anyone who knows anyone who is a "spam fighter" get them to take a look at this stuff. For anyone who has time & some cross wiki experience it is a worthwhile area to work. Those with close ties with other language projects could approach local workers too.

Comments welcome but it is a time for doing not talking (I'll spam talk pages on Meta with a link to this). Cheers --Herby talk thyme 07:09, 16 April 2008 (UTC)Reply

What actions can a non-admin take? I'm an admin on nlwiki and I'd like to help out if I can, but I'm not an admin here. --Erwin(85) 07:38, 16 April 2008 (UTC)Reply
Any help would be seriously appreciated Erwin. This being a wiki you can do what you like! More helpfully (& my opinion only) are the links really excessive, unwanted, spam? Again for me it means checking the diffs out on some of the wikis (& the site probably too). Has it been removed by the local folk (fr & nl are pretty good at spotting spam)? Maybe try Luxo's tool for cross wiki contribs (& blocks too).
Then it is your judgement - if you feel it is not excessive linkage or is not a current threat to the project then "close" it (as far as I know that merely means replacing "open" on the status with "closed"?) with any comments.
If you do see it as spammy then add that comment and hopefully someone will get round to blacklisting it and closing it - certainly I will do what I can.
There are some reports where the same IP is placing a number of links - that makes me quite suspicious so if you pick up on anything like that do mention it. Any help will be appreciated - thanks --Herby talk thyme 07:49, 16 April 2008 (UTC)Reply

It will probably not break the bot, but the bot will just put it back .. so it is of no help. I would suggest to just close those that seem fine-ish, they will come back if it reoccurs. The problem is that I am at the other side of the bots, and hell, there is a lot of work that does not even get onto this page. We need people here, and on IRC! Blacklisting is a solution, but it would be better to hit them with the wikitrout when they are actually doing it. I blacklisted a couple of links while they were busy spamming, and I have seen two immediately coming here to complain. It also gives us less work, when the links are blocked (here) or whitelisted (on the bot), the reports can be closed, and there is less to clean .. We just really need more people! --Dirk Beetstra T C (en: U, T) 10:41, 16 April 2008 (UTC)Reply

I will automatically hide those older then 5 days. Later I can (vie a database call, display them should we ever get that far). If there is continued link additions, the bot will re-add the link. Sorry folks for not being around :( —— nixeagle 19:25, 16 April 2008 (UTC)Reply
Just a note, those hidden can be recalled at a later date if folks are interested in looking at it. I do agree, we just need more people! —— nixeagle 19:44, 16 April 2008 (UTC)Reply
Well, you have one more person. If I step on toes, or mess up with the various templates etc, please poke me or I won't learn. – Mike.lifeguard | @en.wb 04:33, 17 April 2008 (UTC)Reply
I've re transcluded the list, as nakon did some damage to it. That along with the removal of the older items did the trick, however! we need folks to continue to watch this, or its just going to happen again. If the bot reported sectino ever gets larger then about 25-35 items, we have a backlog. (It should generate about 20-30 a day). —— nixeagle 04:34, 17 April 2008 (UTC)Reply

(same as above :-) ): I want to add, we also need people on IRC watching our bots. The output of the bots we are running there does show when accounts are actually busy spamming cross-wiki, and much work and damage can be stopped when reacting promptly there. I yesterday added two before they were reported here (closed the reports this morning). Also, when you hit them when they are busy, they notice that what they do is a problem, if you add them the next day, they may never know what happened. --Dirk Beetstra T C (en: U, T) 10:34, 17 April 2008 (UTC)Reply

Not really related, but to ping people's watchlists maybe, and avoid spamming talk pages. I've suggested new styles for Template:LinkSummary and Template:UserSummary on their respective talk pages. Don't want to make rash changes as these are rather often used. My purpose is to make them readable, which makes them useful. Currently, I find it very difficult to find what I'm looking for in there, so I redid them. – Mike.lifeguard | @en.wb 15:35, 18 April 2008 (UTC)Reply

Looking ahead

"Not dealing with a crisis that can be foreseen is bad management"

The Spam blacklist is now hitting 120K & rising quite fast. The log page started playing up at about 150K. What are our options looking ahead I wonder. Obviously someone with dev knowledge connections would be good to hear from. Thanks --Herby talk thyme 10:46, 20 April 2008 (UTC)Reply

I believe that the extension is capable of taking a blacklist from any page (that is, the location is configurable, and multiple locations are possible). We could perhaps split the blacklist itself into several smaller lists. I'm not sure there's any similarly easy suggestion for the log though. If we split it up into a log for each of several blacklist pages, we wouldn't have a single, central place to look for that information. I suppose a search tool could be written to find the log entries for a particular entry. – Mike.lifeguard | @en.wb 12:24, 20 April 2008 (UTC)Reply
What exactly are the problems with having a large blacklist? --Erwin(85) 12:34, 20 April 2008 (UTC)Reply
Just the sheer size of it at a certain moment, it takes long to load, to search etc. The above suggestion may make sense, smaller blacklists per month, transcluded into the top level? --Dirk Beetstra T C (en: U, T) 13:16, 20 April 2008 (UTC)Reply
Not a technical person but the log page became very difficult to use at 150K. Equally the page is getting slower to load. As I say - not a techy - but my ideal would probably be "current BL" (6 months say) & before that? --Herby talk thyme 13:37, 20 April 2008 (UTC)Reply
I don't know how smart attempting to transclude them is... The spam blacklist is technically "experimental" (which sounds more scary than it really is) so it may not work properly. I meant we can have several pages, all of which are spam blacklists. You can have as many as you want, and they can technically be any page on the wiki (actually, anywhere on the web that is accessible) provided the page follows the correct format. So we can have one for each year, and just request that it be added to the configuration file every year, which will make the sysadmins ecstatic, I'm sure :P OTOH, if someone gives us the go-ahead for transclusion, then that'd be ok too. – Mike.lifeguard | @en.wb 22:12, 20 April 2008 (UTC)Reply
A much better idea: bugzilla:13805! – Mike.lifeguard | @en.wb 01:43, 21 April 2008 (UTC)Reply

Well, Mike.lifeguard, i don't know where to answer you, for what you made by putting all my links 'weblog.ro' from pages like Simone de Beauvoir, Houellebecq etc. on the black list, so i'll do it here, where i see your name.

What if all this is not true and you, and all your friends here, made an abuse? What if that site you're talking about is a simple blog, and has no advertising and will never had, and all the videos it has there are just cultural, with cultural themes, and no one who wants to get clicks will ever do it by posting cultural things about writers, what if all those are really just writers that i love or belive in, and want everybody to see those interviews, and that's all, what if you, just you, are a plane, simple, pure, full time idiot after all, and you've just offended a guy who did nothing wrong, and doesn't even know how to do that? hum?

never mind, have a nioce life with your friends. you must be happy persons. i know you will remain many.

Date/time in bot reports

In the bot reports there's a date and time given for each diff. However the given time isn't actually the time of the revision. Both the hour and the minutes differ so it's not simply another timezone. Does anyone know what the given date/time mean?--Erwin(85) 18:12, 20 April 2008 (UTC)Reply

It is the time on the machine the bots are running on. For me (I am in Wales, UK) it looks like the box is 5 hours and 2 minutes off. We could correct for that, but I guess it is more an indication of the spam-speed than something that is really necessery, the diffs give the correct times. Hope this explains. --Dirk Beetstra T C (en: U, T) 19:51, 20 April 2008 (UTC)Reply
Thanks. --Erwin(85) 12:29, 21 April 2008 (UTC)Reply

Letras Libres

Letraslibres(dot)com neither is spam nor may be used to bypass it, is just a magazine in spanish.

It is not the site that is defined as spam. It seems to be a blog, and the link was inappropriately added to many wikipedia, see User:SpamReportBot/cw/letraslibres.com. --Dirk Beetstra T C (en: U, T) 16:35, 23 April 2008 (UTC)Reply