Jump to content


From Meta, a Wikimedia project coordination wiki

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

Key Personnel[edit]

  • Antonio J. Reinoso (ajreinoso_at_libresoft_dot_es)
  • Jesús M. González-Barahona (jgb_at_libresoft_dot_es)
  • Felipe Ortega (jortega_at_libresoft_dot_es)
  • Israel Herraiz (israel_dot_herraiz_at_upm_dot_es)
  • Rocío Muñoz-Mansilla (rmunoz_at_dia_dot_uned_dot_es)

Project Summary[edit]

The main goal of this project is the analysis of the use given to Wikipedia by its users. In particular, we aim to determine both temporal and behavioral patterns resulting from the different types of interactions between the Encyclopedia and its users.


Our methodology is based on the analysis of the requests submitted by users to Wikipedia. These requests are made available to us through the corresponding log lines registered by Squid servers once they have been served. The analysis consists in, both, a parsing and a filtering process devoted to extract the relevant fields, first, and to filter and store the ones considered of interest, after. As a result, a populated database is ready for statical examinations. The WikiSquilter tool has been developed to perform all these tasks. It is released under a free license and is available at http://sourceforge.net/projects/squilter/


Scientific publications Free access to files containing the received log lines (Due to the huge amount of data we can not maintain all the received log information but only the most recent. Nowadays are keeping the records of the last two years.

Wikimedia Policies, Ethics, and Human Subjects Protection[edit]

Benefits for the Wikimedia community[edit]

Detailed characterization of the traffic directed to Wikipedia. Possibility of traffic forecasting based on the temporal patterns found. Determination of different users' behaviors when browsing the Encyclopedia. Differentiation of the most requested resources and services in each language edition. Possibility of obtaining a geo-locatation of users' requests.

Time Line[edit]



Antonio J. Reinoso's doctoral thesis is completely based on this study. Several publications also based on this study can be found at http://gsyc.es/~ajreinoso/papers

External links[edit]