Statistics/Consultation
Contents |
Tools providing consultation statistics [edit]
- Wikipedia article traffic stats – counts hits (views) for individual articles (since 12-07)
- n:template:Popular articles provides number of hits for top 20 articles of the hour on wikinews.
Raw data [edit]
Raw about consultation data is made available by Domas Mituzas. Files can be found here. One compressed file is created every hour and contains counts of every requests of a page from Wikipedia and other related projects. This data also contains pages which do not actually exist, and is not processed: for example, redirected pages are counted separately, as are pages accessed using different encodings.
Data format [edit]
The data is stored in files such as pagecounts-20100412-170000.gz, indicating the date (12 April 2010) and hour (1700). Each line in the file follows the following format:
en White_lead 9 122588 en White_lie 2 138038 en White_lies 3 18907 en White_light 2 45042 en White_light_scanner 1 7881 en White_lion 9 152551
where the four fields correspond to
- the project code
- the article name
- the number of hits
- the total of bytes transferred
Tools [edit]
The Kiwix project provides a few tools to download and merge these usage stats. They are available here:
- get_hourly_charts.sh updates a local directory with the new remote usage stats.
- get_daily_charts.sh merges and cumulates the stats over a day. It removes the hour based files.
- get_monthly_charts.sh merges and cumulates the stats over a month. It removes the day based files.
To get merged, cumulated and consequently smaller log files, simply call periodically these three scripts with the directory where you want to store the logs as first and only argument.
Archives [edit]
The server http://dammit.lt/wikistats/ contains only the most recent files (up to about 6 months old). The entire backlog of files is available at dumps.wikimedia.org.
The following people archive this data on a "best effort" basis:
- User:Schutz: data starting from December 2007 (complete except for a few corrupted or missing files), thanks to Mathias Schindler who provided the oldest files. I am happy to help if you need a copy of the data; please email me.