Jump to content

Grants:IdeaLab/Examination of gender in biographies

From Meta, a Wikimedia project coordination wiki
statusnot selected
Examination of gender in biographies
How challenges to Wikipedia contributions differ depending on the gender of the article to which the contributions are made
amount28,625
contact email_(_AT_)thomaslevine.com
granteetlevine
idea creator
Bluerasberry
researcher
Tlevine
volunteer
A li cor
this project needs...
volunteer
developer
designer
project manager
community organizer
advisor
join
endorse
created on16:30, 11 February 2015 (UTC)


Project idea

[edit]

What is the problem you're trying to solve?

[edit]

Sue Gardner published nine reasons why women do not edit Wikipedia. Several of the reasons include women being conflict averse. One of the reasons is specifically that women's edits are likely to be reverted. This project seeks to examine the circumstances under which reviewers on Wikipedia challenge contributions of female contributors to Wikipedia.

What is your solution?

[edit]

Contributing content, having it reviewed, and having it reverted is a complicated multi-party social process on Wikipedia and there is only limited information about how this happens. This process happens continually in at least hundreds of thousands of articles in many languages daily. This project seeks to develop a process to examine these exchanges, and to produce data to give insight to the exchange.

We originally wanted to test whether a contribution to a Wikimedia project is received differently depending on the gender of a contributor. Establishing the gender of an editor is quite difficult because editors usually do not indicate their genders and because women sometimes indicate that they are male in order to avoid harassment, so we instead assess whether the response has to do with the gender of the article.

Our null hypothesis is that gender of a biography is not a factor in how contributions to the biography is received. An interesting result of this research would be evidence that contributions receive different evaluations depending on the gender of the article that is being edited. In addition, if the genders of articles tend to match the genders of their contributors, (And we have no idea as to whether that is true.) the present research could provide evidence that contributors of different genders receive different evaluations of their contributions to Wikimedia projects.

Methods

[edit]

Hypotheses

[edit]

According to Sue, women don't edit because their edits are likely to get reverted. Implicit in this assertion is that men's edits are less likely to be reverted or that women are more affected by reversions to their edits. Earlier, we said that there is one hypothesis, but we now break that into two.

  1. An edit to a biography of a male is less likely to be reverted than an edit to a biography of any other gender.
  2. An editor is more likely to keep contributing after having their edit reverted if the edit was made to a biography about a male person than to a biography of a person of any other gender.

The latter of these takes a bit longer to explain and involves a bit more work, so we are primarily focused on the former.

Data collection

[edit]

As a proxy for all editing of Wikipedia, this project considers biographies in any language of Wikipedia, and does the following with them:

  1. List all biographies on any language Wikipedia
  2. Use information in Wikidata to determine the gender of the person featured in the biography
  3. Examine the differences in treatment between those contributing to biographies of people of different genders. At minimum,
    1. Count the number of edits to all biographies, and calculate an average
    2. Count the number of reversions of contributions to all biographies, and calculate an average
    3. Count the number of talk page posts to all biographies, and calculate an average
  4. In the end, report trends in the counts of interactions which happen in Wikipedia biographies

All of the above information can be derived from the database download.

[edit]

To test our first hypothesis, we will compare the rate of reversions between biographies about men and biographies about non-men. With each biography, we will count how many edits were made in total and how many edits were reversions. This will give us a percentage of edits that are reversions. We will compare this percentage between biographies about men and biographies about women.

To test our second hypothesis, we will look a bit deeper into the data. We will look at each user who edited a biography and had the edit reverted and check whether the user continued editing the biography after the edit was reverted. We are interested in whether the user continued editing the article because it indicates whether they have the confidence to continue editing even after her or his edit was reverted.

[edit]

(Add Tom's other notes and the comments from the talk page.)


Statistics that we should be able to compute

[edit]

Another way of thinking about the research is to discuss the statistics that we will compute during the research. Here are the statistics that we seek to compute at minimum.

  • Median edit count per article for biographies of men
  • Median edit count per article for biographies of women
  • Median edit count per article for biographies of other genders
  • Probability that an edit will be reverted for biographies of men (for example, average rate of reversions per edit in an article)
  • Probability that an edit will be reverted for biographies of women
  • Probability that an edit will be reverted for biographies of other genders
  • Probability that an editor whose edit was reverted will edit the same article again, for biographies of men
  • Probability that an editor whose edit was reverted will edit the same article again, for biographies of women
  • Probability that an editor whose edit was reverted will edit the same article again, for biographies of other genders

As indicated above, this is the bare minimum, and we expect to produce far more. For example, we may produce means in addition to medians if we determine that averages are more informative. Also, the term "probability" leaves much open to interpretation; a simple version would be based on an average, but other approaches might be more useful.

Here are other sorts of statistics that we will consider.

  • Probability that an edit is vandalism, for biographies of men
  • Probability that an edit is vandalism, for biographies of women
  • Probability that an edit is vandalism, for biographies of other genders
  • Probability that an edit will be reverted, based on its size
  • Probability that an editor will make any more edits at all if her/his/zir edit is reverted on a biography of a man
  • Probability that an editor will make any more edits at all if her/his/zir edit is reverted on a biography of a woman
  • Probability that an editor will make any more edits at all if her/his/zir edit is reverted on a biography of a person of another gender
  • Count of how many biographies of men that an average editor has made
  • Count of how many biographies of women that an average editor has made
  • Count of how many biographies of other genders that an average editor has made

Example reports

[edit]

An example report which may come from this research is as follows:

"Among biographical articles about people whom the Wikipedia article identified as male, articles which were developed with between 1-50 edits and which had existed for at least 1 year before the start of the research had their talk pages edited an average of 1.5 times, usually only to set up WikiProject templates. In the history of these articles, on average at least 1 contribution would be reverted in the life of the article until the date of study examination, with 20% of these articles never have had a reversion and articles with at least 1 reversion being more likely to have subsequent reversions than all of the articles considered as a whole. Among articles which had at least one reversion, 10% of those subsequently had a post to the talk page, and among articles which had more than 5 reversions, 50% of those subsequently had a post to the talk page.

For those biographical articles about males which had more than 50 edits in their history, their talk pages were edited an average of 2.0 times. These articles had an average of 2 reversions among all of their edits, with 5% of the articles never having had a reversion. Among articles with at least one reversion, 35% of those subsequently had a post to a talk page, and among those which had more than 5 reversions, 80% of those subsequently had a post to the talk page."

Again, the null hypothesis is that articles for women would have identical trends of these sort to articles about men. However, if there is more wiki-pressure on female-oriented topics, this research may report increased instances of conflict towards female-oriented contributions, as follows:

"Among biographical articles about people whom the Wikipedia article identified as female, articles which were developed with between 1-50 edits and which had existed for at least 1 year before the start of the research had their talk pages edited an average of (fewer than male may indicate tension) times, usually only to set up WikiProject templates. In the history of these articles, on average at least (more than males number may indicate tension) contribution would be reverted in the life of the article until the date of study examination, with (fewer than male percentage) of these articles never have had a reversion and articles with at least 1 reversion being more likely to have subsequent reversions than all of the articles considered as a whole. Among articles which had at least one reversion, (percentage could be higher or lower than male - different implications) of those subsequently had a post to the talk page, and among articles which had more than 5 reversions, (percentage could be higher or lower than male - different implications) of those subsequently had a post to the talk page.

For those biographical articles about males[unclear] which had more than 50 edits in their history, their talk pages were edited an average of (number could be higher or lower than male - different implications) times. These articles had an average of (number could be higher or lower than male - different implications) reversions among all of their edits, with (more than male percentage may indicate tension) of the articles never having had a reversion. Among articles with at least one reversion, (percentage could be higher or lower than male - different implications) of those subsequently had a post to a talk page, and among those which had more than 5 reversions, 80% of those subsequently had a post to the talk page."

Having confirmation that there can be gender based tension in the contribution and review cycle of Wikipedia in articles which feature gender will help further conversation and development of interventions to accommodate female contributors.

Scope and budgeting

[edit]

This could be a bigger project, if I included all of the things I want to do and all of the things on the talk pages, or a smaller project, if I stripped the present proposal to down to something very small and less interesting. I tried to come up with something that would match the scale of prior grants.

Relatedly, how much is one usually able to request from each grant type?

(And are there other particular places we should consider?)

Project goals

[edit]
  1. Quantify differences in treatment of contributors to biographical articles which feature males versus those which feature females
  2. Qualify the extent to which these numbers create an environment in which contributing to female-oriented topics may be more or less challenging than contributing to male-oriented topics

Get involved

[edit]

Participants

[edit]
  • Researcher I would like to do the research. Tlevine (talk) 03:11, 21 February 2015 (UTC)
  • Volunteer I can help with the research, and especially in analyzing/visualizing data for the report and distribute it to media outlets for awareness A li cor (talk) 09:34, 5 March 2015 (UTC)

Endorsements

[edit]
  • It sounds like a good start. No doubt when the researchers look at particular page histories they will come up with some thoughts about how to handle vandalism reversion as compared to other reverts. I'll add some feedback to point to a particular type of vandalism that I think is part of the problem. Sminthopsis84 (talk) 04:09, 5 March 2015 (UTC)
  • Because the low procent of wikipedia women writers shows something that has to be taken seriously and examined. Nadezhda Bravo Cladera Nadezhda Bravo Cladera (talk) 08:04, 5 March 2015 (UTC)
  • Why not? Just make sure that the selections of M and F bios are broadly compatible as a test base. Retired electrician (talk) 11:05, 5 March 2015 (UTC)
  • This research will not allow to measure the gender gap, but it is still useful for measuring impact of the gender gap on our content. I have left some concerns on the talk page, but if done correctly this research will confirm (or reject) one of our main gender gap assumptions: that gender gap impacts the content — NickK (talk) 17:10, 9 March 2015 (UTC)
  • This data would be incredibly useful! Keilana|Parlez ici 00:33, 10 March 2015 (UTC)
  • Sounds useful, but also very similar to Grants:IdeaLab/WIGI: Wikipedia Gender Index. Jane023 (talk) 21:02, 31 March 2015 (UTC)
  • I'd suggest merging it with WIGI, with invitations to the team here to join the WIGI team (which I am a part of) in collaboration. --Piotrus (talk) 09:26, 22 April 2015 (UTC)

Expand your idea

[edit]

Do you want to submit your idea for funding from the Wikimedia Foundation?

Expand your idea into a grant proposal

Project plan

[edit]

Activities

[edit]

Conduct the research described in the idea, and disseminate it through writing and speaking.

Budget

[edit]

Researcher: 20,625 USD (55 USD per hour, 15 hours per week, 25 weeks) Cloud computing resources: 4,000 USD (to filter the database dump and process data about 703,190 living people) Travel expenses for two conferences: 4,000

Community engagement

[edit]

I'll blog about my results every two weeks. The first few posts will be about setting up data processing infrastructure, and the rest will be about actual results. (This is how I typically do research.)

Sustainability

[edit]

I will package my data processing software to allow for reuse, which will facilitate continued research that uses the paradigm of comparing biographies. When presenting the research, I will discuss the methods, not just the findings, and encourage others to take advantage of the data processing infrastructure that I set up.

Measures of success

[edit]

The idea includes specific figures that we should be able to compute, such as the chance that an edit on male biographies is reverted and the chance that an edit on female biographies is reverted. If I manage to produce these figures, I have succeeded; if not, I have failed.

Project team

[edit]

Bluerasberry

Community notification

[edit]

Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?