Research:Contribution Inequality
Inequality of contribution  

Main contact 
Giovanni Luca Ciampaglia

Start  2011August 
End  2011August 
Status  completed 
Fields  computer science social computing human–computer interaction 
Open data  
Open access  
WMF support  
Wikimedia research projects  
Topic[edit]
Has contributing to Wikipedia increasingly become an elite activity? Are contributions coming only from a restricted circle of editors, or is everybody more or less contributing the same amount? These questions have been already explored to some extent by the research community^{[1]}. The inequality of contributions has been first studied by Ortega et al.^{[2]}. Extending on their work, here we look at the inequality of contributions and use the Gini Coefficient and see how inequality is changing over time and across namespace. This will let us understand if certain activities are more or less open to everybody or not.
Process[edit]
For each year we count how many edits each user did to each namespace and rank users in descending order of contributions. We can measure the amount of inequality by means of the Gini coefficient, a measure of inequality that is widely used in economics and the social sciences.
Results and discussion[edit]
The plots below report the increasing inequality in the distribution of editor contributions by different namespaces. The distribution of contribution is known to follow an heavytailed distribution, perhaps a Powerlaw ^{[3]}, so we expect high values of the Gini coefficient. One interesting thing to note is that even though the main namespace is more or less stable around a 90% inequality, contribution has become increasingly skewed for the two project namespaces (NS 4, Wikipedia, and NS 5, Wikipedia Talk), starting from values around 60% in 2001 up to more than 90% (and thus more than the main article namespace) around 2009. If we focus only on users with at least 10 edit (the minimun to be considered a Wikipedian by the community), and higher (at least 100 and 1000 total edit count, respectively) we get of course lower values of the Gini coefficient, since we are only considering the tail of the distribution, but still see the effect of increasing inequality for the project namespace.
We can ask ourselves if the elite forms more or less a stable group or not. In order to quantify the amount of churn in the top contributors, we compute the set similarity, or Jaccard coefficient, of the top 100 (top 1000) contributors between one year and the next. The following plots show that oneyear similarity has increased from about 20% (a fifth of the top contributors shared across one year and the next) to about 45%.