Grants:IEG/The Missing Masses: Investigating the Absence of Women and Non-Cis People among Wikipedia Editors

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
statusnot selected
The Missing Masses: Investigating the Absence of Women and Non-Cisgender People among Wikipedia Editors
summarySurveying women and non-cis people who are active/inactive editors to identify pain points and determine strategies to increase the diversity of Wikipedia in the US and SouthEast Asia.
targetWikipedia Demographics
strategic priorityincreased participation by women and non-cis editors.
created on07:56, 11 April 2016 (UTC)
round 2 2015

Project idea[edit]

What is the problem you're trying to solve?[edit]

Wikipedia is the world's largest open repository and knowledge commons. It is used by people all over the world to learn about various topics. Yet more than 90% of the editors are men [1]. This creates an imbalance in the kinds of knowledge that are available and valued, and how that knowledge is presented. A neutral point of view is a fundamental principle of Wikipedia [2], but the absence of diversity in editors ensures that true neutrality is impossible when only one demographic is represented. Understanding why the demographics of editors skews so severely toward white and/or cisgender men is the first step in creating an editorship that is truly representative. Previous Wikipedia research surveys on editors focused on the number of editors and their usage patterns, but there is no qualitative study on users (for this case, focusing on women and non-cisgender editors). Additionally, the surveys are usually aimed at active editors (5+ edits per month), whereas users who are less active editors are usually not taken into consideration. We believe this set of users, while difficult to access, will be able to provide more insight on the pain points of Wikipedia for under-represented groups. This is especially important when we are looking at increasing the diversity of Wikipedia editors.

What is your solution?[edit]

We propose to generate qualitative data to increase understanding of the on-boarding, usage, and attrition experiences of women and non-cis Wikipedia editors in the US and Southeast Asia (including India, the Philippines, and Indonesia). We will investigate the experience of learning to become an editor and perception and experience of community culture from the perspective of women and non-cisgender users. In other words, what is it like for women and non-cisgender editors to join the Wikipedia community, participate, and what causes them to leave? Understanding the answer to this question provides us with much-needed insight on building effective solutions to address the editorship gender gap. Furthermore, our study provides a varied national and racial demographic to compare and contrast for emergent trends common to women and non-cis editors across the platform.

From this data gathering initiative, we will segment, analyze and present our findings and recommendations for increasing diversity and retention amongst the target survey population. This report will be presented for peer review by the stakeholder communities on Wikipedia and in the gender activism space, and all data (with PII [3] scrubbed) and findings will be released under Creative Commons for the public's benefit.

Sample Population[edit]

Our sample population is composed of inactive/low activity editors that are women or non-cisgender individuals. Women and non-cisgender people together make up a majority of the population in many places, yet are drastically underrepresented in the Wikipedia editor community. We include both women and non-cisgender editors because both groups are often subject to similar types of disenfranchisement in many societies, but also have differences in their experiences. To focus on the experiences of women as editors may not sufficiently identify those of non-cisgender people, and vice-versa. We are using "non-cisgender" (rather than “trans” or “queer”) as this is a more inclusive framing. Firstly because the language and culture around gender are not universal; non-binary, non-cisgender people in South Asia may not identify as simply “queer” as they might in the United States. The existence of categories of gender that do not easily align with the typical Western binary in other cultures highlights the importance of moving away from binary modes of gender identification in this work. Secondly, even in the US not all non-cis people identify as trans or male/female; some may be genderqueer, or gender-fluid/flexible, or identify as ungendered. We are not investigating sexual preferences/orientation among editors.

Why these places?[edit]

It is of course true that established and relatively mature Wiki communities have a lot to offer those that are comparatively younger. However, Wiki is a global organization and it should not be assumed that what makes for effective engagement and retention of editors for one part of the community will be directly applicable to other groups. As is true in the management of any commons, understanding of the experience of local cultural institutions is necessary. Further, it is vital that problems within one community--such as the exclusion (intentional or not) of certain groups of people--not be disseminated to younger, emerging organizations. The US represents a large portion of the overall Wiki community, has a well-developed user and editor base, numerous local councils and editor groups, and is important for understanding the current state of the community. It may also  represent a model of distributed organization that Wiki communities around the world can use to build their own institutions as they grow and develop. Developing a deeper understanding of why the US Wiki community diverges so strongly from broader society in terms of editor makeup is essential to providing a healthy model for others to learn from. South/Southeast Asia, with more than a quarter of the Earth’s population, represents an enormous, diverse pool of possible contributors to the Wiki community. With the rapid development of the global technology industry, the US and South/Southeast Asia have become inextricably linked, both economically and culturally. As the Wikipedia community in the US grows and develops, so too do those effects find their way into the commons of countries and cultures we are deeply tied to. Like any commons, the development of healthy, inclusive local institutional norms is necessary for the management of a knowledge commons like Wikipedia[1]. As this part of the world continues to expand access to electronic knowledge commons, and Wikipedia becomes more ubiquitous, institutional norms around who “belongs” as part of the community and what perspectives and contributions are valued may shape the growth of those communities in powerful ways. Understanding what underlies the gender gap issues of this community while it is relatively young and growing may allow for it to develop into a more diverse, inclusive organization as it reaches maturity. In addition, there are many possibilities for US communities to learn useful lessons from the intense work being performed in India around the concept of an egalitarian commons[2].

Project goals[edit]


  1. Produce a network of contacts within Wiki-community organizations that allow for access to and communication with marginalized groups within the larger Wikipedia community.
  2. Develop a robust survey instrument that investigates:
    • What motivates women and non-cisgender editors to join Wikipedia as editors.
    • What their experience as editors has been like.
    • What causes them to become inactive/low-activity editors.

Initial Analyses:

  1. Perform initial analyses to look for consistent trends in the experiences of users.
  2. Determine how data and contact network can inform future work.


  1. Initial findings to the larger community, through summary statistics and a short written report.
  2. The anonymized data will be made available to the larger community.

Project plan[edit]


Survey Methods[edit]

We will use a structured questionnaire instrument, administered via the web, composed of questions investigating the experiences of editors using Likert-type[4] responses and a small number (one to three) open-ended response questions (to be analyzed using axial coding [5]), along with basic demographic information. The instrument will be composed of questions appropriate for all editors, and will attempt to elucidate the experience of formal and informal institutions (rules and norms) within the Wikipedia community. The preliminary outline of the instrument will be composed of three sections, plus demographic data. The first section will investigate what led the person to become involved as an editor, the second their experiences as an editor, and the third their reasons for a low level of participation or nonparticipation. While the intention is to keep the instrument as concise as possible, a number of “duplicate” items investigating key responses in each section will be included to allow for testing of reliability via internal consistency methods.

The data collected via this instrument will be used to search for trends and divergences in the experiences of editors, and to identify problem areas or "pain points" that need to be addressed.

Sampling Strategy[edit]

Generating a representative sample of the population in question is vital to a valid, useful outcome. Our target population is composed of women and non-cisgender Wikipedia editors with low levels of activity. Our sampling frame will be editors in that category on whom some basic identifying information exists and is accessible. Thus, we will require a list of inactive/low-activity editors and associated demographic information in order to construct our sample. In order to generate this list, we will reach out to local wiki councils, editor communities, and other related organizations, in turn asking them for assistance in reaching other organizations that may be interested in participating (this is, in essence, using a snowballing method at the level of organizations rather than individuals). These groups will be asked to distribute an introductory letter to their members, describing our study and providing a link to the instrument. The construction of this list and associated communities connections will in itself be a highly valuable product of this study. For the sample, respondents will be sent an initial contact email explaining the study, and two follow-up messages, and waves of sampling will be performed until the goal response rate is hit, or time to sample has run out. The goal is to achieve similar response rates in the two populations, and have a final sample proportionate to population make-up.

Sources of Error[edit]

A number of possible sources of error exist, some of which may be minimized, and others of which are unavoidable. Coverage Errors: Because our sample frame will be dependent on the honest and open responses of the target population (many of whom may have little incentive to disclose sensitive information about themselves), it is likely that the frame will be an under-representation of the population. It is also possible that those who are more open about their status as women/non-cisgender people in the editor community may have different experiences than those who are not. Further, as we must rely on localized communities to generate our sampling frame, the geographic coverage will necessarily be patchy, and will miss any individuals that are not connected such communities. Sample and Non-response Errors: Similarly, even if we were to employ a completely random sampling method, those who chose to respond to the survey may have had experiences which differ from those that chose not to respond, and response rate may differ strongly between groups. Non-response at the item level due to dissatisfaction with choices, or an unwillingness to respond to the particular item may also prove problematic. Bias in respondents can be watched for looking for a strong preponderance of extreme responses (either positive or negative), but is difficult to minimize if it exists. The language of the contact letter will be careful crafted such that it does not predispose or deter those who have a particular experience as editors (positive, negative, or neutral).

Future Research[edit]

The proposed work represents the first phase of a possible research work. The future phases would involve a complete, in-depth analysis of the generated data, followed up by additional intensive qualitative work. This work would be consist of interviews conducted with interested respondents (a question will be included at the end of the survey instrument asking if the respondent would be interested in participating in further research, with a link to a separate form where they can enter their contact information). Following the collection and analyses of interview data, a comprehensive report will be produced, interpreting and reporting the responses from the survey work, enriched by qualitative data from the interview subjects.

Relation to Previous Research[edit]

Our proposed work exists within the context of the previously conducted work, and seeks to address and extend that work in a productive manner. More details on the relationship between our proposal and previous research on the Wikipedia Gender Gap can be found here The_Missing_Masses:_Investigating_the_Absence_of_Women_and_Non-Cis_People_among_Wikipedia_Editors/Relation_to_Previous_Research.


Section Item Description Cost(USD)
Preparation (40%) Community Engagement Identify and develop partnerships with organizations for local deployment. 12,000
List development Work with Wikimedia and partnering orgs to develop survey list.
Survey development Differentiated surveys written for various cultures and countries.
Deployment (30%) Survey Deployment Deployment through various partner channels. 9,000
Results Gathering Compilation of data
Analysis (20%) Data Preparation Coding and entry of open responses, error cleaning, scrubbing of any PII collected. 6,000
Preliminary Data Analysis Initial summary analyses of responses for total sample, and sub-populations of interest.
Preliminary Interpretation Develop initial assessment and interpretation, as well as preliminary recommendations to address the gender gap for women and non-cis editors.
Reporting (10%) Data Visualization Presentation of summary data in visual format for clarity of understanding. 3,000
Report Creation Writing, editing, proofing, etc.
Graphic Design Summary report layout, illustration, and design elements.
Publishing Published through partner channels under Creative Commons license, housing of data in appropriate repository.

Community engagement[edit]

We will be working closely with Wikimedia communities that operate in the field of gender equity in both the US and Southeast Asia. In the US, we will identify and collaborate with organizations focused on women's, non-cisgender, and queer issues, while in Southeast Asia we will work with organizations that address issues of first access and technology for women. As the project develops, we will share our process and findings with these organizations to ensure equitable representation and consideration of their interests as stakeholders. This will ensure these communities have fair and equitable access to conduct self-sustaining research in the future and enable them to use the results to implement positive change in their communities based on the data. During the process of community engagement, we will take the utmost care to provide safe space for interviewees to express their opinions in a secure, private way that does not present an undue burden through unrefined and unnecessary questions.


We are deeply committed to increasing the editorship of under-represented groups on Wikipedia. In order to rectify this imbalance of demographics, we must ascertain why the imbalance exists in the first place. Gathering this data is the first step towards creating a sustainable ecosystem where marginalized people feel welcome, comfortable, and able to thrive. Once the factors that are keeping women and non-cisgender people from being active editors are identified we can develop data-driven strategies to increase retention, activity, and recruitment. From the analysis and recommendations we report to the community, in the future we plan to expand the findings into projects for improving the experience of women and non-cisgender editors on the Wikipedia platform. This may take the form of recruitment initiatives, tweaks to existing infrastructure, or programs for experienced editors to mentor more junior women and non-cisgender editors as they learn how to navigate the culture and community of Wikipedia. This data not only presents a wealth of opportunity for the immediate Wikipedia community, but is also a massive resource to stakeholders in the wider gender equity community to help them build sustainable and fair community engagement efforts in the future.

Measures of success[edit]


  1. Creation of a list of organizations of individuals relevant to gender-gap work.
  2. Generate a representative data sample of women and non-cis editors.
  3. Producing previously non-existent data regarding the experiences of women and non-cis editors as Wikipedia Editors.
  4. Survey approximately 1000 members of the target population with a representative mix of all demographics listed above.
  5. An initial summary report of findings..


  1. A rich pool of open data that can be released to the community for future works on gender equity.
  2. Initial data analysis that provides clear indications of attrition factors for women and non-cis editors, with rich future analysis possible.
  3. Initial recommendations for the larger community that can be translated into future projects, engagement strategies, and efforts to recruit and retain women and non-cis editors for the purpose of a more representative, equitable, and welcoming Wikipedia community.

External Support[edit]

Our research team has already begun to establish a network of support for the proposed work, which is detailed below.

Organizational and Institutional Support[edit]

  • The Center for Social Science Computation and Research, University of Washington
  • Center for internet and Society Bangalore  ( Possible Collaboration)
  • Breakthrough
  • Bytes4all

Communities we are connected with[edit]

  • Wikimedia India Community ( includes several language communities )
  • Wikimedia Philippines

Communities we intend on connecting to[edit]

  • Wikimedia New York
  • Other Wikimedia Chapters in United States
  • Wikimedia Indonesia


  • Jordan Bunker, Data Analysis & Visualization Engineer
  • James Burke, Senior Graphic Designer


These are funding sources we are pursuing to further support this project, both financial and in-kind grants.

  • Bill and Melinda Gates Foundation (funding and report dissemination)
  • The Tableau Foundation (funding and software)
  • Softlayer (server space)
  • TISS ( volunteers )

Get involved[edit]


Chinmayi SK[edit]

Chinmayi SK is a gender and technology activist who works on issues of gender-based violence in both online and offline spaces through her organisation The Bachchao Project. She also advises various organisations on technology and gender initiatives. Chinmayi is a long time editor of English Wikipedia, especially in the area of natural history. She has been a member of the Wikimedia India community for the past two years, has been involved in their gender gap initiatives, and has actively been assisting them to make the community more friendly and welcoming to people of various genders. Chinmayi has also worked on personal safety technologies as a polyglot developer (Python/Java/C/PHP/JS), project manager, and organizational lead.

Lindsay Oliver[edit]

Lindsay is an activist, operations consultant, and writer who facilitates diversity, equitable systems, and open culture in the tech industry. With a background in education, digital humanitarian technology, and social justice initiatives, she evaluates and creates the tools, teaching methods, and cultural norms necessary to ensure marginalized populations have access to the wealth of experience and opportunity that technology has to offer. Lindsay has worked on a variety of gender-based projects including overseeing Everyone Hacks (a safe space hackathon series for marginalized populations to engage with civic technology), and is a contributing member of the Online Harassment Task Force. She has a BA in English and Gender/Women's Studies from Loyola University Chicago and a Master of Arts in Teaching in Secondary English Education & ESL/ELL from National Louis University.

M.S. Patterson, MPA/MSES[edit]

M.S. Patterson is a graduate of the School of Public and Environmental Affairs at Indiana University, a former member of the Center for the Study of Institutions, Populations, and Environmental Change and the Ostrom Workshop in Political Theory and Policy Analsysis, and is currently a PhD student at the University of Washington. As a researcher he is focused on the dynamics of human-driven systems, particularly how both formal and informal institutions impact communities and the physical world. He has previously collected and analyzed large sets of survey data investigating the values and practices of members of the public in regards to management of common pool resources. He has studied and taught statistical methods, qualitative and quantitative data collection methods, and effective data representation.

Community Notification[edit]


Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project in the list below. (Other constructive feedback is welcome on the talk page of this proposal).

  • Community member: add your name and rationale here.
  • People who are usually under-represented become more committed contributors if the system integrates them well. In that context and based on the excellent track record of the applicants, I would like to see this project go live. Please do note the suggestions and concerns listed in the talk page. Ravi (talk) 11:48, 18 April 2016 (UTC)
  • I too am concerned about the lack of gender diversity among Wikipedia editors and the potential editorial bias that represents in an important global information source. I feel the proposed is vital to Wikipedia's continuing usefulness. Maryannecollier (talk) 02:46, 21 April 2016 (UTC)
  • This is a hugely important issue. This problem has been long standing and the applicants are exceptionally well suited to conducting this research and analysis. Its crucial to understand how biases work and how to make Wikipedia more representative overall. User: SL Chemaly 02:46, 21 April 2016 (UTC)
  • Support: There is already a large body of academic research on Wikipedia's gender gap. However, there is a lack of reliable and quality research on the topic focussed on south-east Asia. Similarly, little is known about the participation of transgenders and gender minorities. The findings of this research would be extremely useful in guiding policy and outreach in SE Asia. -- Rohini (talk) 10:07, 23 April 2016 (UTC)
  • Support: all too sorely needed. Just adding to the case, my research indicates that as across all Wikipedias there are just 152 articles about non-cis people. Stop the editor discrimination and its interwoven biography discrimination. Maximilianklein (talk) 14:52, 25 April 2016 (UTC)
  • Support. Mssemantics (talk)


  1. Hess, C. & Ostrom, E., 2005. A Framework for Analyzing the Knowledge Commons. In Understanding Knowledge as a Commons: From Theory to Practice. Cambridge, MA: MIT Press, pp. 1–54. Available at:
  2. Hess, C., 2012. The Unfolding of the Knowledge Commons. St Antony’s International Review, 8(1), pp.13–24. Available at: