Toolserver

This box: view · talk · edit

This page is deprecated. No new account requests will be accepted here.

If you want a Toolserver account, please use the new account request process.

If your request is still listed on this page, it will be processed at some point. However, it might speed things up if you re-file it with the new system. If you do so, please add (jira) to the headline of your request on this page, and refer to your old request when you file the new one.

Old requests remaining to be processed[edit]

sq:User:Mdupont[edit]

I am working on open streetmap in kosovo and would like to host photo annotations and maps on wikipedia.

es:Usuario:Platonides[edit]

I'm interested in doing a tool to find the adding of specific text (who added that?, in which revision?). When it was recently added, there's no much trouble, but if you try to find who added this recently deleted image several months ago... you can become mad ;) I have little experience on MW code (they rejected bugzilla:5763 i may do it when i felt free enough). However, i have played a bit with extracting information [1] [2] [3] and JS [4]. Ah, and i run a bot when i'm in mood of then revising its changes. Platonides 17:11, 12 June 2006 (UTC)[reply]

That sounds nice, but a lack of simple access to page text might hamper your ability to do this. How would you work around that? robchurch | talk 01:43, 19 June 2006 (UTC)[reply]

Oh, i wasn't realizing that Daniel Kinzler == Duesentrieb. It's really a problem, as we're stuck on WikiProxy with the external storage problem. However, the Toolserver is still the better way to perform it. Would a limit time between queries need to be enforced? (it seems robust enough but it's better to have things secured) Platonides 22:11, 19 June 2006 (UTC)[reply]

You don't seem to have thought out the implementation with respect to the problems. robchurch | talk 16:32, 23 June 2006 (UTC)[reply]

The algorithm is pretty clear. That the get revision text step can't be done asking the DB but need to ask the WikiProxy is not a big change. It's more a Toolserver's problem than user's one. Moreover, the WikiProxy first looks into the text table, only asking through HTTP if it's not locally available. I don't understand what you mean. That could be achieved through other methods, like JavaScript, but in no way better. Platonides 16:51, 24 June 2006 (UTC)[reply]

My point is that WikiProxy is fine for requesting a few pages every so often. For applications which depend upon continuously fetching and comparing page text, it's usually better to use a database dump. Incidentally, that it is "a toolserver problem" is fairly germane; yes, it is a problem, and yes, we are trying to work out how to fix it, but we also expect some co-operation from our users. Excessive resource usage leads to processes and queries being killed; it's the same on any multi-user system. robchurch | talk 16:08, 2 July 2006 (UTC)[reply]

A database dump is fine if you want statistics, not if you want data about live wikipedia. So you would still need fetching the last week or so. I'm not sure if wikiproxy/db page retrieving already handles it or if it would need to be added at application layer. However, these dumps would only be available for large wikis (define large as you want: by dump size, number of revisions, number of tool queries...). Platonides

It seems the same feature was requested years ago at bugzilla:639. Can it be tried, instead of having theoretic arguments? Platonides 22:27, 15 July 2006 (UTC)[reply]

de:User:Jokannes[edit]

I running the german Wikipedia MP3-Podcast and an automatically gernerated OGG-RSS-Feed for spoken articles at de:Wikipedia:WikiProjekt_Gesprochene_Wikipedia/RSS-Feed. I host all of the services on my own server, causing me 6GB of monthly traffic and the need for approx. 2GB disk space. (only for german podcast) I have all together approx. 50.0000 monthly hits and about 150 listeners to the german MP3-Podcast. I'd like to expand my services to the international wikipedia. Beside hosting of the nesecessary scripts, causing 1GB Traffic and consuming about 0.5GB disk space (for a very short time audio converison cache OGG->WAV->MP3), I like to host MP3 versions of the original OGG versions of spoken articles somewhere (not nescessary on Toolserver). You will find more Information in German on my discussion page at de:Benutzer_Diskussion:Jokannes#Toolserveraccount. de:User:Jokannes 19:19, 19 January 2007 (UTC)[reply]

What about the MP3 patents? Within the United States, royalties are required to create and publish a MP3 file. I personally advocate for greater adoption of Ogg/Vorbis/Theroa and other FLOSS formats. Thanks, GChriss 20:03, 10 April 2007 (UTC)[reply]

de:Benutzer:Head[edit]

Hi, I'm an admin ad the German Wikipedia, and I've been a core PyWikipediaBot developer for several years. I'm running de:Benutzer:Zwobot (and, for interwiki placement, his brothers in > 100 other languages).

I'm the author of PyWikipediaBot's weblinkchecker.py. This is a tool to detect and report broken external links. See weblinkchecker.py for details. The problem is that this script uses a LOT of bandwidth. It will download any HTTP or HTTPS URLs linked to from all articles, about 50 simultaneously, including large files like images and PDFs. I don't know exactly how much traffic it causes, but if I run it on my home DSL line, I get some false positives (socket error) because the line is congested. CPU and RAM usage are minor, and I only need a few MB to store the found dead links.

Also, it would be interesting to improve PyWikipediaBot's MySQL features. At the moment, most scripts directly parse an XML dump, and could be heavily sped up by accessing a MySQL database. This is of course limited by the lack of full text access on the toolserver, but maybe some scripts (e.g. redirect.py) could make use of that limited MySQL access anyway. --Head 23:37, 12 September 2007 (UTC)[reply]

Why is it needed to download the hole file? Wouldn't it be enough for to just reas the HEAD-Informations from the server? --DaB. 15:33, 27 September 2007 (UTC)[reply]

Is it not possible to delete 4XX-status-links automatic in the article?--Luxo 18:16, 11 October 2007 (UTC)[reply]

Luxo: That is not possible. First, sometimes there are obvious typos in the URLs, which are easy to fix for a human user, but not for a bot. Second, some external links are embedded into the context, and you cannot just rip them out. You might destroy the meaning of the sentence, or you could remove an important reference. --Head 02:05, 20 January 2008 (UTC)[reply]

DaB.: The bot first tries to get the header only. If there are no problems with this, the bot considers the link alive. This is the case with most of the checked external links. But if something goes wrong with the HTTP HEAD method, the bot retries with the GET method. Some servers seem to be misconfigured and report errors with HEAD, but work fine with GET (and thus work fine in web browsers).

Still, the mass of loaded headers, together with a few large files, leads to a huge bandwidth consumption. --Head 02:05, 20 January 2008 (UTC)[reply]

ar:User:المنسق or en:User:Ordinaterr[edit]

I am a bot operator on the Arabic and other wikipedias. I want to run the bot 24 7 on Arabic wikipedia for warnings, etc. I would like to have a toolserver account to run the bot (and interwiki bots on other wikis). --Ordinaterr 05:06, 11 November 2007 (UTC)[reply]

Your bot is allowed to work on interwiki and create new pages only..What do you mean by warnings?--Alnokta 11:46, 22 November 2007 (UTC)[reply]

So it runs while I am offline. --23:24, 3 January 2008 (UTC)

simple:wikt:User:Wenli[edit]

Hi, I am an administrator on the Simple English Wiktionary, and I currently run an interwiki bot, simple:wikt:User:WenliBot. I would like to run this bot on the toolserver. Thanks. Wenli 04:28, 12 January 2008 (UTC)[reply]

We have had some problem with interwiki.py in the past (much use of RAM). I have first see, if the developer fixed it. --DaB. 23:05, 9 February 2008 (UTC)[reply]

There are still the problem, that the interwiki-bot need too much ram. So do you have another idea, what you could do with an account? --DaB. 16:49, 3 March 2008 (UTC)[reply]

nl:User:Zanaq[edit]

I'd like to run some real time analysis tools, yet to be developed like:

requested articles in a category (and all subcats). (code).
relevancy and relatedness of subjects. (no code written yet)

See also Zeusmode, a javascript reasonably well-used on the dutch wikipedia. Zanaq 15:03, 10 February 2008 (UTC)[reply]

mm, I guess live-data is not a good ides. The wikimedia-server-guy deactivaded this data not for fun, I guess :). What about cached data? --DaB. 17:07, 3 March 2008 (UTC)[reply]

me mm too :-) I don't really understand the problem. I am not aware of any server-guy disabling anything. The first tool is simply to find out which non-existent articles are linked to from a certain category. This is handy to find incorrect links, and relevant missing articles. It is not a statistical analysis tool. It has to be (reasonably) realtime so you don't have to manualy filter out the articles that have recently been created. Zanaq 09:50, 6 March 2008 (UTC)[reply]

TheDaveRoss on en.wiktionary[edit]

User:TheDaveRoss
My main interest is figuring out how people use Wiktionary and efforts to improve the way we do things based on that information. I would be using toolserver for things similar to the polling project currently underway at en.wikt, because Connel shouldn't have to do _all_ the work.

Of secondary interest in in support of my bot's work on Spanish entries using the toolserver for analysis of existing/non-existing entries.

Wasn't logged in, but it was me :) - TheDaveRoss 06:43, 17 February 2008 (UTC)[reply]

User:Shizhao[edit]

Hi, I'm is admin Chinese wikimedia(wikipedia, wikitionary, wikinews...) and an steward. I currently run an pywikipediabot, I would like to run My bot and stores sourcecode on the toolserver. My bot is zh:User:Sz-iwbot (interwiki bot), User:Welcomebot (Welcome new user) and User:Talkindexbot (Talk page index, self made see User:Talkindexbot/source). And I develop other codes. --Shizhao 13:25, 5 March 2008 (UTC)[reply]

user:Walter[edit]

I am user Walter, the one that is doing Wikizine. I provide an CGi:IRC-gateway service to give people easy access to the Wikimedia related IRC-channels. I would like to have an account on a toolserver to run such a gateway. see http://cgiirc.org My current host can not handle very well all the requests for it. It often fails. I would keep my current hoster but only list the gateway hosted on the toolserver as one of the options so not all traffic goes to it. Currently I have about 300mb traffic a month for this service. And 20-30mb webspace is enough for this. --Walter 11:37, 9 March 2008 (UTC)[reply]

It could be a legal problem to run such a service. Is the request still valid? --DaB. 01:25, 12 March 2009 (UTC)[reply]

nl:Gebruiker:Sterkebak[edit]

Hello. I'm a from Holland and I like to work whith bot's, and because i love to word with wikipedia. I am workin on a anti vandalism bot for the dutch wikipedia. But that has to work 24/7 so i would like to use the toolserver. 80.61.62.123 13:49, 21 March 2008 (UTC)[reply]

And how far is the project? --DaB. 01:26, 12 March 2009 (UTC)[reply]

en:User:Cream[edit]

I'd like a toolserver account so i can run perlwikibots so it can do repetitive work for me like rollbacking serial vandals or do some wikimeetup spam on request... --Cream 01:04, 29 March 2008 (UTC)[reply]

This user is a reformed blocked user on the English Wikipedia as well as former banned user from the WikiMedia IRC channels due to his bot-aided disruption of the said channels. While the user has expressed strong will to reform, and has started in the right direction, it maybe too soon, in my opinion, to grant him access to such a powerful tool as toolserver. IIRC he has already expressed willingness to stop his distruptive behavior in the past while then failing to do so. I do strongly believe that this user has changed for good, but maybe a little waiting can do no harm. Nothing personal, Cream, you know that :) Snowolf ^{How can I help?} 21:19, 3 April 2008 (UTC)[reply]

So this request is moved to beginn of november. Please report any NEW problems with Cream here (and of corse Good things too). --DaB. 14:43, 14 July 2008 (UTC)[reply]

See I right that the user is not active anymore? --DaB. 01:29, 12 March 2009 (UTC)[reply]

What? Did you say i was inactive? Huh? What? Right! Huh! --Mixwell 22:21, 12 March 2009 (UTC)[reply]

en:user:Compwhizii[edit]

Hi there! My name is John and I am from the English Wikipedia and the Wikimedia Commmons. I currently have 4 running and planned bots the I operate. For the third and fourth bot They will need to run continuous to be useful. en:user:John Bot III will be the first bot to use the toolserver, and will tag policy violating images on Wikipedia. The other will do that on the Commons. The ability to use the toolserver will allow me to employ bots that are effective 2/47. Thank you for your time and consideration for this request. Best Wishes, Compwhizii 00:12, 3 April 2008 (UTC)[reply]

I hate to rush anyone but running the bots on my computer really isn't working for me. I don't have the biggest connection on the planet so the bots can eat up my bandwidth pretty easily. And since I cannot have my computer running 24/7 the bots are not working to my full expectations. So I would be very pleased if I could have an account sooner. Thank you, Compwhizii 19:43, 29 April 2008 (UTC)[reply]

Both Jobs are done already me Filbot (my Bot) on commons and by OrphanBot on en.wiki. The last one even better of what JohnBot will do (or mine, that's the same, because we use the same script written by me). So, imho no need for a toolserver account. --Filnik 11:47, 2 May 2008 (UTC)[reply]

Looks not urgend if there are such bots allready. --DaB. 15:12, 12 May 2008 (UTC)[reply]

Yeah, I guess not. Compwhizii 19:48, 14 May 2008 (UTC)[reply]

Update: I'm now CWii CWii 14:25, 1 June 2008 (UTC)[reply]

en:User:ramkumaran7[edit]

I am a software engineer intersted in opensource technologies and community contribution. I have contributed to wiki monetarily and in some maintanence activities like categorisation , i would like to use my experience with the computers to contribute more to wiki maintanence, so please give a role in wiki toolbox server maintanence or any other work u decide as appropriate

http://en.wikipedia.org/wiki/User:Ramkumaran7

06:57, 30 May 2008 (UTC)

User:FhG_FOKUS[edit]

Hi, I am a scientific worker from the Fraunhofer Institute FOKUS and we want to contribute to the work of user Kolossos. We would like to create more access interfaces like WebService IF and Servlet IF. Please create an developer account for us! FhG_FOKUS 09:27, 14 July 2008 (UTC)[reply]

I have contact to him and believe that we should use the chance to get professional support for Wikipedia-World .--Kolossos 13:25, 14 July 2008 (UTC)[reply]

I need informations if the user is still active; if so, there should be no problem with an account. --DaB. 02:17, 12 August 2008 (UTC)[reply]

The user wrote you an email. --Kolossos 19:26, 29 September 2008 (UTC)[reply]

pt:User:Zanst[edit]

I'm a brazilian programmer and I would like to use toolserver to make and test a administration robot in python language that will check random wiki pages for inappropriate words and wrong content. Thanks. --Zanst 17:49, 14 July 2008 (UTC)[reply]

Em qual projeto em português você trabalha? Alex Pereira ^falaê 14:23, 15 July 2008 (UTC)[reply]

Translation: "In what Portuguese project do you work?" —translated by Google :D Cbrown1023 talk 02:42, 12 August 2008 (UTC).[reply]

Atualmente não estou trabalhando em nenhum projeto. Comecei a pensar na idéia de criar o robô a alguns dias. Sei que ja existem várias iniciativas, mas gostaria de contribuir de alguma maneira mais avançada. --Zanst 14:25, 16 July 2008 (UTC)[reply]

Translation: "Currently I am not working on any project. I started thinking about the idea of creating the robot a few days. I know that there are already several initiatives, but would like to contribute in some way more advanced." —translated by Google :D Cbrown1023 talk 02:42, 12 August 2008 (UTC).[reply]

Procure por Leonardo.stabile, na Wikipédia em Português, que ele poderá esclarecer-lhe melhor. Antes, crie uma conta, pode ser com este nome de usuário que usou mais acima, para entrar em contato. Maiores esclarecimentos, pode me procurar, na minha página de discussão aqui. Alex Pereira ^falaê 16:31, 16 July 2008 (UTC)[reply]

Translation: "Look for Leonardo.stabile in Wikipedia in Portuguese, he could explain it better. First, create an account, it can be with this username that you used above, to contact them. Further clarification, I can look at my page of discussion here." —translated by Google :D Cbrown1023 talk 02:42, 12 August 2008 (UTC).[reply]

Sorry, I don't speak pt, so please write a short translation of your talk here in english or german, thanks :). --DaB. 01:59, 12 August 2008 (UTC)[reply]

See ---^^ Cbrown1023 talk 02:42, 12 August 2008 (UTC)[reply]

en:User:JGXenite[edit]

Hi. I'm an editor across on en.wiki, and I'm involved in attempting to combat vandalism by a fairly prolific sockpuppeteer, WJH1992. This user generally makes use of IP socks, which are temporarily blocked on sight, but obviously can become active again later. I've developed a script that makes use of the MW API to check for recent activity of sock tagged IPs (with possible expansion to check rangeblocks associated with this user for recent activity). The script is currently running on my private server, but I think would have room for expansion if it could be made available on the Toolserver. ~~ [Jam]^[talk] 03:36, 16 July 2008 (UTC)[reply]

en:User:Crazycomputers[edit]

This is Chris from the English Wikipedia. I have a (relatively) small XMPP bot that I would like to run on the toolserver. I have been using my own server at home for this but since power and Internet access can be unreliable I'd like to move it to the toolserver. I am also planning a few features that would benefit greatly from direct database access. Crazycomputers 14:56, 16 July 2008 (UTC)[reply]

en:Wikipedia:User:Mifter[edit]

Hello everyone, I am a bot operator on the English Wikipedia. I'd like to move my bots (En:User:MifterBot and Commons:User:MifterBot) to the toolserver; it currently runs on my desktop computer, which doesn't always have a reliable internet connection :( (And because of that I cannot set my bots to run continuously as I would like to). I'd also like the ability to, in the future, run bots that require a more persistent connection than what I can provide from my current connection (e.g. Maybe an archive bot or something like that). Thanks and All the Best, --Mifter 18:52, 21 July 2008 (UTC)[reply]

en:User:tj9991[edit]

Hi there. I would like to request an account on the toolserver for the purpose of developing and consequently running a bot to detect edits which appear to be simple tests. This bot would periodically poll the recent changes list for only small additions, as anything which causes a large page size change will already be under review by either a bot or a user on counter-vandalism patrol. I intend to write it in Python and have already taken the liberty to make sure that it meets the version requirements for me to begin.

You can view my latest project in Python here (SVN trunk), if you feel it is necessary to assess that I am an experienced programmer.

I thank you for your time. Tj9991 15:52, 25 July 2008 (UTC)[reply]

Update I've finished my bot and it has been approved for a trial. It is ready to run on the toolserver as soon as I receive an account. tj9991 (talk | contribs) 19:18, 27 July 2008 (UTC)[reply]

Update 2 The bot has been running with a flag for the past day on my home computer. It's able to keep up during the slow times of the day, but without a better setup it can't follow the traffic. --tj9991 (talk | contribs) 14:38, 29 July 2008 (UTC)[reply]

Sorry for the OT but.. have you seen this bot? :-) --Filnik 08:56, 7 August 2008 (UTC)[reply]

en:User:Falcon Kirtaran[edit]

I'd like to request an account on the toolserver to create part of a bot, which queries wikiproject categories with backlogs and generates a bar graph in wiki markup showing the number of pages in backlog by date against the total. I plan to do it using C, which is the language I have the most experience in (as my university teaches CS in it). Falcon Kirtaran 19:38, 14 August 2008 (UTC)[reply]

Toolserver User:Example[edit]

I have experience in wiki related things (markup) (in wiki openSUSE). The Most I want is to build 2 Things

A User/Talk page designer..(May use RICO) most from js...(I have started making it !)

I may not need DB access. After success i will try out a

A statistics program (openSource) (produce Charts/Graphs) from php

I am signing this from IP because I dont have accounts in Wikipedia Pl.Reply on https://wiki.toolserver.org/view/User_talk:Example

Hi! I think you have to register on any wikimedia project before requesting toolserver acces.--Anon. 09:45, 1 December 2008 (UTC)

af:User:Naudefj[edit]

I would like to make a translation tool available to the Afrikaans Wikipedia community. The tool is based on the pywikipedia framework. It will typically read a page from a foreign language Wikipedia (initially only from NL), and use a dictionary and interwiki links to translate the article into Afrikaans. The translated text can then be further customized before it's uploaded to the Afrikaans Wikipedia site. Best regards. Naudefj 13:22, 21 August 2008 (UTC)[reply]

en:User:Shunpiker[edit]

Hello,

I would like a toolserver account so that I can work on (and share) an alternative to the current use of the querycache, which due to the LIMIT in queries that populate it, is constraining the usefulness of some "Special" reports for larger projects like en Wikipedia. This project would not require page_text fetching.

I proposed a patch earlier this year, which would add an indexed id to the cache, and thus allow returning paged results by index range scans (which are cheap regardless of the size of the cache) rather than the use of limit and offset queries (which become linearly more expensive with a larger cache). At least part of the reason for restricting the result size of the querycache-populating queries seems to have been that a larger cache was more expensive to query.

JeLuF (in bug 3419) expressed some concern that the inserts which populate the cache could also be expensive, and that the work should therefore be carried out on the toolserver.

I would like the opportunity to bring this work forward, building a working model of the solution proposed in the patch which would provide comparative metrics as well as an alternate series of reports to the "Special" pages on Wikipedia which are broken due to the current implementation of caching.

Please feel free to email me with any questions or whatever.

Thanks for your consideration, Shunpiker 14:52, 1 September 2007 (UTC)[reply]

AFAIU Jeluf said, the toolserver should be used when someone need data. I'm unsure if we have the capacity to offer unlimit special-pages for such big projects as enwp realtime. Perhaps we/you should start with a refresh every day. Would you agree? --DaB. 15:56, 27 September 2007 (UTC)[reply]

Hi DaB, Thanks for the response! I think a daily refresh, or even a weekly refresh with the enlarged cache, would be a good step forward for a report like Lonelypages on enwp. The cost of the additional inserts could be offset by running the refresh less frequently than it is currently run. Even a monthly refresh of that report would be an improvement on the current situation where the vast majority (guessing ~85%) of relevant pages are always excluded from the report altogether due to the cache size limit. In deciding how often to run the report, I think it would be worthwhile to measure and compare the cost of:

The cache-populating query, with the limit
The cache-populating query without the limit
The cache-insert with the limit
The cache-insert without the limit

I would also like to measure:

The cache-reading query using limit/offset
The cache-reading query using indexed range scans

Here's another avenue I am interested in exploring: The point of the query-cache is to take heat off the database, but if the database resources consumed by queries on the cache itself (let alone infrequent inserts) have become a concern, what are some other options for implementing the cache? It seems like the data in the cache is neither highly structured or typed, so berkeley databases or memcached or something else might serve just as well as the current mysql implementation, and better in so far as an alternate implementation would not be in contention for the precious db resources. -- Shunpiker 07:04, 16 November 2007 (UTC)[reply]

I would like to renew this account request, first placed in September 2007, but never concluded. There has been recent concern raised on "Wikipedia_talk:Special:LonelyPages" about how the cache size limit has broken the orphan-tagging bot on English Wikipedia. I remain convinced that the problem solvable, and admits a number of solutions. -- Shunpiker 01:04, 1 September 2008 (UTC)[reply]

simple:User:Chenzw[edit]

Hello, I would like to run my interwiki bot (ChenzwBot) on the toolserver. It uses the pywikipedia bot framework. Putting it on the toolserver would be more productive as I can only run the bot on my computer for about 4 hours a day. Thanks. Chenzw 03:10, 27 September 2008 (UTC)[reply]

sorry for the long delay. do you still want the account?

there's no problem in principle, however, the bot server has pretty high load at the moment. We are looking into getting more hardware. -- Duesentrieb 17:19, 19 March 2009 (UTC)[reply]

de:user:Ralf Roletschek[edit]

Hi DaB, die Umfrage http://de.wikipedia.org/wiki/Benutzer:Ralf_Roletschek/Fragebogen_Berlin08 bzw. http://de.wikipedia.org/wiki/Benutzer:Philipendula/Fragebogen_Berlin08 soll nicht nur bei Berlin08 sondern ein paar Monate lang durchgeführt werden. Ich könnte das problemlos auf privaten Webspace packen, es soll aber einen offiziellen Eindruck machen. Serverlast wird nicht nennenswert erzeugt, das mache ich in HTML mit ASCII-Dateien. Es sollen mind. 300 Umfragen gemacht werden, optimistische Gedanken gehen von 1000 aus. Dabei ist die Befragung aktiver Wikipedianer ausdrücklich nicht vorgesehen, nur bei besonderen Anlässen ev. (Mitgliederversammlung, Academy) - das wird grad diskutiert. Die Auswertung soll offline erfolgen. Warum ich die Frechheit besitze, hier nicht englisch zu schreiben, weißt du ja ;) Würdest mich eh nicht verstehen...--Marcela 19:31, 5 June 2008 (UTC)[reply]

Brauchst du den Account jetzt noch? --DaB. 15:12, 14 July 2008 (UTC)[reply]

Eigentlich schon - nur kann ich ohne FTP damit nichts anfangen, weil ja nichtmal c&p funktioniert. --Ralf Roletschek 15:35, 11 October 2008 (UTC)[reply]

s:fr:User:Kipmaster[edit]

Hello, I am active in several Wikimedia projects, but the tool I would like to code would be mostly useful for Wikisource. It would be about automatic OCR of Djvu files, and maybe also offer the possibility to convert any file (pdf, list of images, ...) to a djvu file. In fact, it would be mostly a clone of Any2djvu.

The OCR used would be tesseract (from Google), and the supported languages, for the beginning, those provided with tesseract: English, French, German, German Fraktur, Spanish, Dutch, Italian, Vietnamese and Portuguese. It is possible to add other languages if we find data for training it.

The result of the OCR could be directly saved in the Djvu file. It is maybe also possible to put it directly on Wikisource projects where the pagemode extension is activated (English, French and German Wikisource as far as I know).

I talked about the idea with several contributors at Wikisource would said it would be useful.

My main concern is about the requirements, both in terms of diskspace (a Djvu file could be up to 100Mbytes, since this is the new limit for uploads at Commons, however, typical files are 10-20 Mbytes) and power (the OCR of one page typically lasts 1-5 seconds, so for a 100 pages book, it would need 2 minutes?). Also, would we have to restrict in some way the use of this tool to Wikimedia projects users? I can imagine a way of doing this, for example by using only Djvu files which have been uploaded to commons. Also we could/should? restrict to one user at a time, in order to reduce the server load. Thanks --Kipmaster 12:44, 25 November 2008 (UTC)[reply]

I support Kipmaster's request. It would be a very useful tool. I don't think the technical requirements are a problem, because there are already 2 bots doing OCR on the toolserver: ThomasBot (account thomasv) and YannBot (account yann). So it would simply be the same service as these 2 bots available for all users. Yann 12:51, 25 November 2008 (UTC)[reply]

I support Kipmaster's request too: he has a good understanding of how the wikisourcians work and how their needs can be met.- --Zyephyrus 22:12, 27 November 2008 (UTC)[reply]

Sounds like a cool project, but I am worried about the resource requirements - especially because we already have bots doing OCS. Basically, it depends on how much documents are to be processed, and who can schedule them for processing. If this thing was to run full throttle all the time, perhaps multiplke instance at once, it would bog down the server for sure. Please tell us what you think how many pages you think will be processed per day, how documents would be supplied, and how access would be managed. -- Duesentrieb 16:56, 19 March 2009 (UTC)[reply]

ru:Участник:ShCoder[edit]

EN: Hello. I am engaged in the development of the generation (and the recognition of same) of the natural language. The system operates on the principle of learning by examples. For literacy training it needs to correct the texts on various subjects. As the source of these texts, I took the text from Wikipedia. Now I get the text parsing content Wikipedia. That extra burden for me and for you. I am interested in a more convenient and less resource-way access to the text and structure of the contents of Wikipedia. I understand that it will not bring anything new on Wikipedia, but will try to further help the development and refinement of writing Wikipedia articles.

RU: Здравствуйте. Я занимаюсь разработкой системы генерации (и распознавания тоже) текста на естественном языке. Система работает по принципу обучения на примерах. Для грамотного обучения ей нужны достаточно правильные тексты на различные тематики. Как источник таких текстов я взял тексты с википедии. Сейчас у меня текст получается парсингом содержимого страниц википедии. Это лишняя нагрузка для меня и для Вас. Меня интересует более удобный и менее ресурсоёмкий способ доступа к текстам и структуре содержимого википедии. Я понимаю, что это не даст ничего нового википедии, но постараюсь в дальнейшем помочь развитию википедии написанием и доработкой статей. 94.50.13.51 17:48, 6 January 2009 (UTC)[reply]

Качайте дампы с http://download.wikimedia.org/ и используйте их прямо на своём компьютере. Тулсервер для этого не нужен. ~ putnik 15:28, 18 February 2009 (UTC)[reply]

The toolserver has no direct access to full article text. And even if it did, you would still have to parse it. Any bulk processing of page content is best done on xml dumps.

Providing efficient access to full page text is a long term goal of the toolserver, but we do not have the resources to do it yet.

Because of this, I do not think you could benefit from a toolserver account. -- Duesentrieb 16:48, 19 March 2009 (UTC)[reply]

en:User:Cool Hand Luke[edit]

Hello, I'm an arbitrator on the English Wikipedia, and I would like to write two tools to help with the detection of sockpuppets.

A tool to process checkuser output, running DNS checks and collating the output in a way that helps CUs identify overlaps (this wouldn't require database access at all, but I think toolserver would be a convenient place to host it).
An improved report for comparing sock accounts. Existing tools show overlapping articles, but no report highlights possible violations of en:WP:SOCK such as possible double votes or editing that could be a violation of en:WP:3RR. I would write a script to highlight such interactions, and also generate other suggestive statistics such as interleaving (back and forth) editing, and delineated timestamp data dumps suitable from easy plotting as in en:File:Mantanmoreland date-time.png.

I mostly write ugly scripts in perl, but I can be persuaded to deal with other languages. Thanks. Cool Hand Luke 21:56, 21 January 2009 (UTC)[reply]

Can you code good enough that no data of the checkuser-output will show anywhere? --DaB. 23:43, 11 March 2009 (UTC)[reply]

The user would have to provide the checkuser output cut-and paste, so they would only receive what they put in; it wouldn't really be useful unless it showed the locations of the different IPs. At any rate, I'm more interesting in making tools that use editing history. Cool Hand Luke 18:14, 29 March 2009 (UTC)[reply]

I would really still like an account. Cool Hand Luke 17:02, 5 July 2009 (UTC)[reply]

en:User:Proppy[edit]

Hello, I'd like to request an account for WikipediaViz project, developed by INRIA, in order to generate diff and per revision meta data, incrementaly instead of working with large xml dumps. Thanks in advance. --Proppy 15:00, 20 February 2009 (UTC)[reply]

There is no direct access to revision text on the toolserver. Any bulk operations on full text should indeed use dumps.

Providing fast access to full revision text is a long term goal, but currently we do not have the resources to implement it.

So, it does not seem like a toolserver account would help you with your project. Perhaps have a look at Wikimedia update feed service for an alternative. -- Duesentrieb 16:28, 19 March 2009 (UTC)[reply]

Wikipediaviz use dump for initial metadata generation, a toolserver account would help to sync the data incrementally each time a new edit is done, instead of flooding the api with constant interval poll.

I thought WikiProxy allowed tool server script to access full revision text, is there a missunderstanding ?

We are willing to offer resources (hardware, human) for implementing fast access to full revision text, how should we proceed ?

Thanks for pointing out the Wikimedia update feed service alternative thought. --Proppy 10:28, 24 April 2009 (UTC)[reply]

en:User:Jamesontai[edit]

I am a senior at en:Florida Institute of Technology studying mechanical engineering with a particular interest in robotics. I have been an English Wikipedia user since 2006 and have almost 20,000 edits [5]. I have been very active in vandal-fighting before, and before that NPP. I'm trying to look for a new niche to contribute to. As I attempt to develop code for new purposes, I'd like a space on toolserver to run some of the code since my domain does not allow me access to some crucial directories.

I'd like an account to begin developing tools for the English Wikipedia as well as my MediaWiki on robotics [6] I started last week. I'd like to run a couple scripts on improving the facilitation on #wikipedia-en-robotics^connect. I'm hoping that in the future I may expand on bots for other WikiProjects, expanding as needed.

Thanks! Jamesontai 16:48, 28 March 2009 (UTC)[reply]

w:en:User:Odie5533[edit]

I am interested in creating and hosting statistics tools for Wikipedia. I don't particularly like the current stat tools available and feel I can improve them. I'm experienced in PHP, Python, C# .NET, and a handful of other languages that probably won't be as useful. Though I realize the programming is completely different, I've written a few User scripts for Wikipedia (links on my user page). Here's an example of other statistics gathering I've done: http://img12.imageshack.us/img12/3365/habbon.png And my programming blog where I put snippets and tools I've written for myself is available here: http://odie5533.com/ --Odie5533 01:15, 15 April 2009 (UTC)[reply]

ru:user:Butko[edit]

I am a administrator of russian Wikipedia and Wikimedia Commons. I need account on Toolserver for Connectivity project. Prior to this user:Lvova ([7]) worked on the project. Probably in the near future user:Lvova will do not have the opportunity to work on this project and I will replace her.

Connectivity project is important for Russian Wikipedia. It improves the search engine position for Wikipedia and readers' conveniency. For successful work, it involves complex database queries and therefore uses Toolserver intensively. Technical part was mainly developed by ru:User:Mashiah Davidson ([8]).

In the future I would also like to write my own tools to improve Wikipedia --Butko 21:13, 9 May 2009 (UTC)[reply]