Toolserver/Reports

From Meta, a Wikimedia project coordination wiki

This page is for discussion and development of a collection of scripts for creating reports from database dumps. The collection is distributed from http://tools.wikimedia.de/~beland/

Infrastructure status[edit]

  • We would like to get a CVS or Subversion repository set up on the toolserver, to facilitate contributions from a number of people. The toolserver admins would like to synchronize ssh and Subversion passwords, but ssh logins currently don't use passwords, only ssh keys. There have recently been problems with the toolserver admins being non-responsive to requests, so I am giving up on waiting for this to happen. -- Beland 17:35, 2 April 2006 (UTC)[reply]
  • I would like there to be a version of the scripts deployed on the Toolserver so that they can be run as a cron job, and Toolserver users can expand and improve them as desired. -- Beland 17:46, 2 April 2006 (UTC)[reply]
  • If you would like to make a contribution, you can post an update to the web, or e-mail me using the "e-mail this user" link from one of my user pages. I will try to post updates to the toolserver page in a timely fashion. -- Beland 17:46, 2 April 2006 (UTC)[reply]

Script status[edit]

  • I no longer have enough hard drive space on my laptop to run these scripts, and my desktop machine does not have enough RAM. (I would recommend 10GB+ for storage, 700MB+ for RAM.) Reducing both of those requirements might be useful in general, as would faster execution. -- Beland 23:00, 2 April 2006 (UTC)[reply]
  • Some scripts are currently broken. There are some notes in auto-run.sh, but what I've been doing is running the scripts one at a time, in the sequence they appear in that file, and checking the output to make sure that it is valid and non-empty. -- Beland 23:00, 2 April 2006 (UTC)[reply]
  • It would be nice if the scripts had a better dependency mechanism. Right now, there is a single central script, auto-run.sh, which runs the scripts in one order which builds dependencies before they are needed. It would be nice if one could simply run the script that output the report one desires, and the scripts that built the input files it needed would be automatically run. On the other hand, if the toolserver has enough RAM and hard drive space, and the scripts are tidied up, they could be run as a cron job, and there would be less need to worry about dependencies. (Unless contributors want to generate certain reports for their own purposes.)

Volunteers[edit]

Update requests[edit]

Last update was in 2011, we could use a new report. --Melody Lavender 08:18, 10 November 2014 (UTC)[reply]

Known bugs[edit]

  • auto-categorize4.pl: A link from Columbus, Michigan to "Columbus_Township, St._Clair_County, Michigan" was interpreted as being a link to a county with the name "Columbus Township, St. Clair", so Columbus, Michigan was added to "Category:Columbus Township, St. Clair County". We don't create categories for townships, only counties. -- Beland

Replacement tools[edit]

See also[edit]