Jump to content

Research:Wikipedia Gender Inequality Index

From Meta, a Wikimedia project coordination wiki
Piotr Konieczny
This page documents a completed research project.

Key Personnel[edit]

Project Summary[edit]

We already know that there is significant disparity between Wikipedia biographies of men and women by year by ethnicity/nationality (Klein 2013, 2014) - English Wikipedia's 1,150,000 biographies are only ~15% female. However, little systematic research has been done attempting to quantify it outside cited exploratory analysis by Max Klein. We intend to investigate gender inequality trends through time and space (country/nationality/language)using Wikipedia biographies.

Simply put, we want to create:

a) an academic index allowing comparative study of gender inequality through time and space,
b) freely licensed graphs (which we could host at Wikimedia Commons and which could improve numerous Wikipedia articles, up to and including the country-specific series of 100+ articles on gender inequality in country x such as Gender inequality in Mexico to name just one) and
c) a freely licensed dataset that could be used by other researchers, with the potential for its indicators to be used in indices such as those listed here (GGGI,, SIGI, etc.).

Research questions (summary version):

  • RQ1: taking year of birth parameter, we can compare the number of Wikipedia's biographies by gender by year (decade, century, millenia). What is the pattern/trend? Can we predict when full equality will be reached?
  • RQ2: what will be the variations by region/country/nationality/ethnicity/religion/language?
  • RQ3: are there interesting variations in easy to calculate variables such as subject longevity and article quality?

Literature review (summary version):

a) a number of studies have been published in the field of sociology & gender studies & wikipedia studies, through they usually tend to focus on the issue why women form only about 20% of Wikipedia contributors. Classic studies in this vein include Lam et al. 2011, Hill and Shaw 2013, Eom at al. 2014; there was also mainstream media coverage (ex. Cohen 2011, New York Times). Wikipedia coverage of women biographies was also discussed by Reagle and Rhue 2011.
b) We refrain from a thorough methodological analysis of the pros and cons of gender gap indices, sex-disaggregated measures, gender-sensitive aggregate measures and related topics and refer the interested reader to Klasen (2007), Mills (2010) and Hawken and Munck (2011).


Wikipedia data dump will be analyzed in order to populate a spreadsheet file that will in turn be further analyzed in statistical software (SPSS).

I have already designed a working spreadsheet to illustrate what can be done.


Results will be presented in an academic conference and journals. I will also try to make them available to interested community members, including by presenting the results here if the paper has been accepted for publication in a non-open access venue.

Wikimedia Policies, Ethics, and Human Subjects Protection[edit]

This research adheres to American Sociological Association Code of Ethics and w:Wikipedia:Ethically researching Wikipedia guidelines.

Benefits for the Wikimedia community[edit]

This research should produce media coverage benefiting the Wikipedia project, and create media (images) directly usable in "gender inequality in country x" series of articles.


  • summer 2014: date collection
  • fall 2014: data analysis
  • winter 2014/205: estimated project end time



External links[edit]