Research:Measuring overall contribution of editors

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Current event marker.svg This project page documents a research project currently in progress.
Information may be incomplete and can change rapidly as science advances.
Research project
Measuring overall contribution of editors
Main contact
Fabian Kaelin
Co-investigators
Diederik van Liere
Start 2011-06
End 2011-06
Status in progress Icon 66 percent.png
Open data This project has published open-licensed data
Open access This project has open access publications
WMF support
Wikimedia research projects Wikimedia research projects


Contents

Topic [edit]

The goal of this sprint is to define a new metric for measuring which editors add the most content to Wikipedia. So far, the main ways of determining the contribution of an individual editor are edit count, number of articles created, and whether articles have passed through assessment processes such as GA and FA. In this sprint we aim to measure the overall contribution using the text added to pages (in kilobytes) by editors, in order to better identify and recognize those Wikipedians who are active authors of the encyclopedia.

The main challenge will likely be filtering out the noise in the revision data (template additions, bots, page moves, script-assisted editing). It would be great if we could successfully separate this noise, as the measure could then be used as an alternative way to objectively determine the contributions of editors.

Process [edit]

First, we will create a list of top contributors on Wikipedia by year and month. Depending on how cleanly we can separate the noise, we can then proceed to investigate how the distribution of contributions has changed over time. i.e.

  • How does the life cycle of an editor look in terms of kb contribution? Does he contribute more at the beginning or towards the end?
  • Has the group of editors that have contributed most of the content become smaller over the years?
  • Have the dynamics of the top contributors changed over time?

Please add any interesting suggestions you might have.

Results and discussion [edit]

Future work [edit]