Wikistats/archive
Information on this page is outdated. For more information on Wikistats, please see Mediawiki.org
Wikistats is a set of Perl scripts used to generate detailed statistics for Wikimedia projects. These statistics are available at stats.wikimedia.org. Erik Zachte is the author of the scripts, and he is also responsible for running them and posting the results. All statistics are produced by analyzing the database dumps, which are usually created monthly.
See Wikistats csv for information on accessing statistics in comma-separated values (CSV) format.
Documentation
[edit]Detailed explanation of some statistics. There is not a real documentation, but some article on specific items here and there.
- Wikistats reports on article revert trends (August 2010)
Source code
[edit]The scripts are stored at GitHub in wikimedia/analytics-wikistats.
Running Wikistats on your own MediaWiki site
[edit]The scripts have not yet been packaged for general consumption, but they can be made to work on any MediaWiki site without too much trouble.
You will need:
- MediaWiki 1.5 or later (for the
dumpBackup.php
script, at least) - Perl version 5.8 or later (avoid 5.6, it has memory leaks)
- Ploticus
Here are the (admittedly hacky) steps to generate the statistics. This is known to work on FreeBSD and Windows XP at least.
- Create a new directory and unzip the scripts there
- Note that the script files are in DOS text format. If you are on Unix, you should convert them to Unix format.
- You might also need to make
WikiCounts.pl
andWikiReports.pl
executable. - You may need to update the contact / website information in the file WikiReportsNoWikimedia.pl
- Obtain a full XML dump of your MediaWiki data using the
dumpBackup.php
script as described at MediaWiki#Database_dump - In the directory with the scripts, create these subdirectories:
counts
dumps
reports
- Rename your xml dump like this :
en-latest-pages-meta-history.xml
- Copy your dump in the dumps directory :
dumps/en-latest-pages-meta-history.xml
- The script support xml compression (gz, bz2, 7z),so this dumps are supported :
dumps/en-latest-pages-meta-history.xml
dumps/en-latest-pages-meta-history.xml.gz
dumps/en-latest-pages-meta-history.xml.bz2
dumps/en-latest-pages-meta-history.xml.7z
- Run this command, where
YYYYMMDD
is the date the XML dump was taken:WikiCounts.pl -x -i dumps -o counts -l en -d YYYYMMDD
- This should create a bunch of CSV files in
counts
- This should create a bunch of CSV files in
- The
WikiReportsOutputPlots.pl
script is hardcoded to runpl
to invoke Ploticus. On some systems (like Unix) the Ploticus executable is namedploticus
. If that's the case on your system, edit the script to change the two occurrences of"pl -"
to"ploticus -"
- Adapt WikiReportsNoWikimedia.pl so that site specific details are used, like your site name and admin name and mail address
- Run this command, using the same
YYYYMMDD
as above:WikiReports.pl -x -i counts -o reports -l en -d YYYYMMDD
- This should create a bunch of HTML, PNG, and SVG files in
reports/EN
- This should create a bunch of HTML, PNG, and SVG files in
- In the
reports
directory, download these additional files which are referred to by the HTML in thereports/EN
directory using a relative../
path:- http://stats.wikimedia.org/background1.gif
- http://stats.wikimedia.org/black.gif
- http://stats.wikimedia.org/blanco.gif
- http://stats.wikimedia.org/bluebar.gif
- http://stats.wikimedia.org/greenbar.gif
- http://stats.wikimedia.org/grey.gif
- http://stats.wikimedia.org/greybar.gif
- http://stats.wikimedia.org/redbar.gif
- http://stats.wikimedia.org/yellowbar.gif
- http://stats.wikimedia.org/WikipediaStatistics11.js
- http://stats.wikimedia.org/WikipediaStatistics12.js
- http://stats.wikimedia.org/WikipediaStatistics13.js
- Now you should be able to load
reports/EN/index.html
in a web browser and see the statistics.
Notes for Windows XP
[edit]The same instructions apply on windows but you will need to install the following:
- Perl from ActiveState.
- Bzip2 for windows from here, unzip and put the file bin\bzip2.exe in your Windows directory. (if dump compression in bz2 is used)
- Ploticus from here, unzip to Windows directory. You can also choose to install ploticus from Cygwin (which has built in support for PNGs unlike the generic windows binary of ploticus).
- Recent scripts can make calls to du (disk space used), df (disk space free), and top (process list), to monitor system resources. Cygwin provides these programs. Resource monitoring is probably not useful on any wiki dump that contains less than 100,000 articles or less than a couple of millions of revisions. From WikiCounts.pl 2.1 on resources are not traced unless you specify option -r.
Alternate Method
[edit]Alternately you can run the commands below which will accept an uncompressed dump called pages_full_en.xml
.
WikiCounts.pl -x -i dumps -o counts -l en -d YYYYMMDD -t -m wp
WikiReports.pl -x -i counts -o reports -l en -d YYYYMMDD -t -m wp
Quality Survey
[edit]During Wikimania 2006 Jimbo gave a keynote speech in which he asked the community to focus less on counts and more on quality. People interested in discussing how wikistats could contribute please check Wikistats/Measuring Article Quality
Serving multiple languages
[edit]Wikistats supports multiple languages. However, users don't always spot the language links at the top right. To serve the "right" one for users automatically from a common domain on Apache, the following approach is suggested:
- Create an index.var file in the root directory with sections for each language in the following format:
URI: index; vary="language" URI: index.CS Content-language: cs Content-type: text/html URI: index.DE Content-language: de Content-type: text/html ...
- Specify the following Apache directives in the apache.conf or local .htaccess file:
LanguagePriority en # default: adjust as appropriate Options +Indexes DirectoryIndex index.var Sitemap.htm RewriteEngine On RewriteRule index\.(..)$ /$1/Sitemap.htm [R=302,L]
The rewriting/redirection ensures the user is in the right base directory. Specifying "XX/Sitemap.htm" as the URI looks like it works at first, but the links on that page will not as they will go to the root directory. Regular Redirect directives do not appear to work in combination with content negotiation.