Statistics/Consultation

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

Contents

Tools providing consultation statistics [edit]

Raw data [edit]

Raw about consultation data is made available by Domas Mituzas. Files can be found here. One compressed file is created every hour and contains counts of every requests of a page from Wikipedia and other related projects. This data also contains pages which do not actually exist, and is not processed: for example, redirected pages are counted separately, as are pages accessed using different encodings.

Data format [edit]

The data is stored in files such as pagecounts-20100412-170000.gz, indicating the date (12 April 2010) and hour (1700). Each line in the file follows the following format:

en White_lead 9 122588
en White_lie 2 138038
en White_lies 3 18907
en White_light 2 45042
en White_light_scanner 1 7881
en White_lion 9 152551

where the four fields correspond to

  • the project code
  • the article name
  • the number of hits
  • the total of bytes transferred

Tools [edit]

The Kiwix project provides a few tools to download and merge these usage stats. They are available here:

To get merged, cumulated and consequently smaller log files, simply call periodically these three scripts with the directory where you want to store the logs as first and only argument.

Archives [edit]

The server http://dammit.lt/wikistats/ contains only the most recent files (up to about 6 months old). The entire backlog of files is available at dumps.wikimedia.org.

The following people archive this data on a "best effort" basis:

  • User:Schutz: data starting from December 2007 (complete except for a few corrupted or missing files), thanks to Mathias Schindler who provided the oldest files. I am happy to help if you need a copy of the data; please email me.