Research:Voice and exit in a voluntary work environment
What you read below is at the proposal level so long as this sentence is here. :) Also, please keep in mind that at the moment, we are focusing on gender gap but in general, we are interested in any gap in contributor demographics.
The Wikipedia gender/minority gap issue
The Wikipedia gender gap, i.e., the fact that only 10 to 15% of Wikipedia contributors are reported to be female, has long been acknowledged within the Wikipedia community. It implies that there is a large and untapped pool of potential contributors. While Wikipedia aspires to represent "the sum of all human knowledge", this gap causes important flaws and biases in its encyclopedic content, as a large pool of potential contributors decides not to voice. The underlying drivers of the gender gap in Wikipedia participation remain largely unknown. Sue Gardner  and others  have proposed that it may be due to overly confrontational behavior by some (predominantly male) experienced contributors, who tend to strongly assert their views and opinions.
Previous research suggests that there are important gender differences in the taste for confrontation and competition, as well as in self-confidence, implying that women may "shy-away" from Wikipedia. This explanation is contested by some community members, who argue that Wikipedia is open to anyone to contribute, so that if women decide not to participate, the reason probably has more to do with intrinsic preferences or a lack of free time. Other researchers stress women’s lack of technological skills as an important driver of the Wikipedia gender gap. More recently, it has been argued that the Wikipedia minority gap (i.e., the underrepresentation of minorities in Wikipedia’s base of contributors) could be driven by similar factors.
State of the scientific literature on the gender gap
Over the past decade, the experimental literature has established that women and men of the same ability differ significantly in terms of self-confidence levels and taste for competition. Through simple experimental protocols, this literature demonstrates that there are important and systematic differences in those dimensions between men and women on average. The field consequences of those differences are important. They predict orientation choices among high school students (controlling for academic results, women tend to “shy away” from the most prestigious – but also relatively more “competitive” – academic tracks),  industry choices among young business professionals, and, ultimately, part of the earnings differential between men and women. However, there is increasing experimental    and field  evidence that those gender differences tend to disappear when women are given a chance to work in teams.
In the context of Wikipedia, evidence gathered from survey and observational data suggests that individual differences in aversion to conflict and self-confidence may be major drivers of the Wikipedia gender gap. 
Intervening to alleviate the Wikipedia gender gap
Based on the existing scientific knowledge, we anticipate that a techno-social solution can help alleviate the Wikipedia gender gap and increase overall retention rates. State-of-the-art recommendation engines are typically based on behavior profiles constructed from the user's previous interactions with the site. When a new user joins Wikipedia, however, no previous interactions are available to the recommendation system. We aim at building a solution to this "cold start" problem with a system that can ask a newly registered user a maximally informative sequence of questions (i.e., a 20 questions approach) in order to rapidly construct a profile of the user's interests than can be used as a basis for making relevant editing recommendations. The first step of this project is to experimentally assess the effectiveness of this system in terms of retention rates.
Building upon the above technology, the second step is to identify new editors who share similar interests and connect them with each other in order to form meaningful contribution teams. We will evaluate the effectiveness of this "matched teams" intervention with a special focus on women retention by comparing it to a control treatment in which only individual contribution recommendations are being made.
Irrespective of the experimental treatment, we will conduct a socio-demographic survey of new users (asking for, e.g., gender, age and highest degree completed) to get a sense of the gender gap at the registration phase, and be able to follow its evolution over time. Men and women tend to interact in very different ways. As a result, it would be interesting to randomize the gender-mix of teams within the treatment group to further differentiate the impact of forming men-only teams, women-only teams and mixed-gender teams.
Finally, we can also use the survey to estimate the additional impact of a number of basic factors on editor retention – and test for their potentially differential effects depending on gender. For instance:
- Aversion to risk
- Technological skills
- Available free time
- Confidence in their ability to contribute content
- Taste for competition
- Preexisting conceptions about the community of Wikipedia editors
Scalability of the intervention
There has been previous attempts at increasing retention rates and addressing the lack of editor diversity through socialization techniques. The vast majority of those attempts involved newbie mentoring strategies. Such strategies can be difficult to implement efficiently, and even when they do seem to work relatively well, they remain difficult to scale because of the cost they impose on the existing community of experienced editors. Some researchers have tried to address this issue by implementing institutionalized socialization tactics, where newbies go through a set of formal "training steps" before joining the community. Such depersonalized strategies seem largely inefficient, however.
The idea here is to retain the individualized socialization strategy, but implement it in a way that (i) doesn't impose any substantial cost on the existing community, and (ii) focuses on forming meaningful teams of newbies, so as to try and alleviate the women competitiveness / lack of self-confidence issue. Wikipedia used to be a space where editors would simply go ahead and figure things out together as they were trying to build up the resource. Another way to think of this proposal is as an attempt to restore this initial atmosphere in a different context.
Given the fact that the gender/minority gap might already be significant at the registration stage, we're also contemplating the idea of randomly inviting readers to register accounts with a targeted SiteNotice to have them go through the above experimental treatments.
The following milestones are considered for this project. Note that the milestones are not necessarily sequential and some may get done earlier than the others.
- Phase 0 Quantifying diversity in different stages: We are interested to learn in which states we start observing reductions in diversity. We want to determine the diversity at the readers level, at account creation level, and so on. This information can help us and others interested in this space to determine the relevant population to target with intervention(s): readers, newly registered users, ...
- Phase 1 Test if the recommended framework works at all through a smaller experiment.
- Phase 2 Solving the cold-start problem: We are interested in learning about the interests of a newly registered account almost immediately after registration. One of the ways we can handle this cold-start problem is by asking the editor a series of questions that can help us learn about the editor and their interest. The design of such questionnaire is a challenge. One way we'd like to understand how to tackle this is by doing some observational studies based on webrequest logs associated with accounts created to see if there are specific characteristics that describe a user that turns from a reader to an editor. This is a question that if addressed can be of interest to others outside of this research as well.
- Phase 3 Develop a recommendation engine able to infer new users' interests and match them with relevant articles and tasks -- Randomized deployment & evaluation
- Phase 4 Match new users in virtual "teams" of contributors based on their inferred interests -- Randomized development & evaluation