Grants:IEG/WIGI: Wikipedia Gender Index

From Meta, a Wikimedia project coordination wiki
statusselected
WIGI: Wikipedia Gender Index
summaryAn ongoing statistical presentation of gender in articles by date of birth, place of birth, citizenship, ethnicity, occupation, and Wikipedia language.
targetWikipedia
strategic priorityimproving quality
themeresearch
amount22,500 USD
contactMaximilianklein (talk), max(_AT_)notconfusing.com
volunteerA li corAcagastyaKomchi
project managerMaximilianklein
community organizerMasssly
researcherPiotrus
designerFrances Soong.
developerMaximiliankleinHargup
this project needs...
volunteer
join
endorse
created on07:01, 23 January 2015 (UTC)
2015 round 1


This grant is about what ended up being put at http://wigi.wmflabs.org/

Project idea[edit]

What is the problem you're trying to solve?[edit]

While statistics of the editor gender gap rely on self reporting and are incomplete, the biography gender gap has rich statistics but is less studied. Furthermore, longitudinal studies (long-term and time-oriented), have less often been undertaken to show the rate and movement of the biography gender gap.

What is your solution?[edit]

Instead of observing the trend of editorship, we will observe the trend of gender in biography articles. We admit that editor-gender and article-gender may not be related, and assuming they are is essentialist - still we claim biographies are worth investigating for their own sake. We have already prototyped research and found preliminary results that analyse the biography gender gap by date of birth, citizenship, and language. However this data only represents a single point in time, it would be more useful to sample these data many times and view the trends. Therefore we would automate the production and graphing of these statistics in a publically viewable website with open-data downloads, and at the end of a year provide a final report on the observed trends.

Project goals[edit]

A view of the female ratio of biographies by date of birth, and by citizenship aggregated by world culture. How will these trendlines, especially recent years, evolve over the next years as editing demographics change?

The ultimate goal of the project is to raise awareness of the gender gap using statistical and quantitative means. The purpose of doing so is to frame the gender gap in a way that makes its thesis accessible to a demographic that tends to be more convinced with quantitative methods. We hope to have a larger group of people talking about the gender gap as a serious issue than currently do so. We also hope that researchers will download and re-use this dataset, bringing Wikipedia's gender gap even more insight. As a result, we hope more conversations will shift, as they have been doing, from whether the gap exists or is important, to what should be done in light of it.

Another more specific goal of our project is to try and see if different, currently enacted solutions to the gender gap are having an effect. Two large caveats come with this goal. The first is that we cannot run our data collections as an experimental observation to any specific remedy to the gender gap problem. Rather we can only see the aggregate effect of all gender-gap projects at once - and without any control data as to what would happen if no efforts were being made. A second caveat is that it is essentialist to argue that having more women-identified editors active on Wikipedia would necessarily mean that the representation of about-women biographies should increase. Still, despite these two caveats we think that more data gives a better view of what the current trends are on different Wikipedias.

Get involved[edit]

Participants[edit]

  • Project manager WGII. I like this proposal. :) QEDK (talk) 03:57, 5 March 2015 (UTC)
  • Researcher I have been involved with this idea since the inception. Count me in on developing this further! Piotrus (talk) 06:13, 5 March 2015 (UTC)
  • Volunteer I can help researching..and then writing up and sharing finding in media outlets to raise awareness on whatever emerges! A li cor (talk) 09:31, 5 March 2015 (UTC)
  • Volunteer I can help in data collection, analysis and presentation of results.→‎Masssly 01:13, 6 March 2015 (UTC)
  • Developer I can help in collecting data and providing an analysis of the results. Hargup (talk) 16:41, 6 March 2015 (UTC)
  • Volunteer I am an editor in many blogs. I can help in researching, writing up, ads for inspiring or other graphic designing! Komchi (talk) 18:00, 6 March 2015 (UTC)
  • Designer I can help design or art direct. "Help make it pretty" tagline concerns me, but I like the empirical nature of this solution. Frances Soong (talk) 18:34, 9 March 2015 (UTC)
  • Developer I can help with website design and development, data analysis and visualization. Vivek Rai (talk) 20:02, 25 March 2015 (UTC)
  • Volunteer I wish to join by volunteering help, providing Ideas, and trying to reduce this gap. Acagastya (talk) 10:26, 30 April 2015 (UTC)

Endorsements[edit]

  • Count me in! --Piotrus (talk) 15:22, 9 February 2015 (UTC)
  • Great idea. Hmlarson (talk) 00:05, 7 March 2015 (UTC)
  • I hope essential gender equality in all aspect of society in global level. LilyKitty (talk) 12:38, 7 March 2015 (UTC)
  • This is a great way to clearly understand specifics about the problem. CarnivorousBunny (talk) 19:42, 7 March 2015 (UTC)
  • Cool idea! Ocaasi (talk) 00:54, 20 March 2015 (UTC)
  • Has my vote. --Magnus Manske (talk) 15:46, 23 March 2015 (UTC)
  • I like this. Great idea! /Haxpett (talk) 19:20, 23 March 2015 (UTC)
  • --LydiaPintscher (talk) 21:53, 23 March 2015 (UTC)
  • "What gets measured gets improved". I like this idea. Hargup (talk) 19:40, 25 March 2015 (UTC)
  • Measures something that matters. + 1 to Hargup. Jodi.a.schneider (talk) 19:57, 26 March 2015 (UTC)
  • Dfko (talk) 23:59, 27 March 2015 (UTC) Good questions to ask & strategy for approaching them.
  • I think this could be useful but I suggest changing the name to "Wikipedia Biography Gender Index Tools". Jane023 (talk) 21:01, 31 March 2015 (UTC) My reason is because only biographies are in this proposal, not editor gender (gendergap), or gender-specification (close to impossible) of Wikipedia topics (fashion objects: "Handbag" vs "Briefcase", professions: "Nursing" vs "Road construction", home furniture: "Vanity table" vs "Workbench").
  • Wicked cool! Jtmorgan (talk) 16:33, 22 April 2015 (UTC)
  • Endorse, but the project needs to be clear that it will mostly be the world's gender-gap that is being measured, rather than Wikipedia's. Johnbod (talk) 14:25, 30 June 2015 (UTC)

Project plan[edit]

Scope[edit]

Activities[edit]

Gender by date of birth and date of death in two different time frames. How will these proportions change?
An example of data re-use: A heat map of the "celebrity" ratio of biographies by gender, language, and decade.

Our project is a longitudinal study on measures of the biography gender-gap. To accomplish this we will create a weekly updated dataset and webpage, akin to stats.wikimedia.org and datavis.wmflabs.org, with views and highlights on how the compositon of Wikidata-gender-having articles are changing. We will sort the collected data by several other variables:

  • Date of Birth/Death
  • Citizenship/Place of Birth
  • Ethnicity
  • Occupation/Profession
  • Inclusion in Wikipedia Language
  • Creation date in each Wikipedia (experimental).

We will consult the community for the types of graph they think are most valuable, and suggest the following two as a starting point. One view will highlight of the current state of the distribution of gender, along with the direction and magnitude of change in each of the above variables. A second view will show "hot" variables which have had the most movement in the past week.

After one year we will create a report from running statistical tests to see if these measures had evolved significantly.

Budget[edit]

Total amount requested[edit]

Budget breakdown[edit]

Phase Date Range Description Estimated Hours Team
A(utomate) Within 1 Month Modify Wikidata Collection Script 80 Developers
A(utomate) Within 1 Month Automate Collection Service on WMF Labs 80 Developers
A(utomate) Within 1 Month Get Community Input on Visualisations 80 Research and Community
B(e seen) Within 2 Months Make Weekly Visulation Code 80 Developers
B(e seen) Within 2 Months Make Longitudinal Visualisation Code 80 Developers
B(e seen) Within 2 Months Design Visualisation Style 40 Designers
C(onnect) Within 3 Months Make Portal: Front End to Viz's 80 Developers
C(onnect) Within 3 Months Make Portal: Usage Counters 40 Developers
C(onnect) Within 3 Months Portal Design 80 Designers
C(onnect) W/in 3 Mos. After Portal Complete Promote viewing 20 Research and Community
D(escribe) 9 months after Phase A Write Statistical Paper 240 Research and Community

The budget is split into three buckets to be distributed among each team. Teams will take responsibility of splitting the work hours among themselves, with a 'bottomliner' for each position, to assume responsibility for the team's completion.


Intended impact[edit]

Target audience[edit]

Our target audience are a mix of editors, academics and journalists; anyone who is interested in the various attempts to quantify the gender gap.

  • In terms of editors, we seek to inform all Wikiprojects which focus on biographies, regardless of whether they are gender-focused Wikiprojects or not.
  • In terms of academics, there has been previous research on the gender gap in Wikipedia, see for example ( Lam 2010, Reagle 2010, Eom 2014, Wagner 2015 ). Notifying that research community about new data made available would perhaps stoke more research on the problem.
  • In terms of Journalists, much has been made of the gamergate controversy, and - perhaps unfairly - the editorship of Wikipedia. With this a message can be sent to the media about Wikipedia's self-awareness of the gender gap in the easy-to-understand statistical terms that are usually potent for journalists.

Community engagement[edit]

During several different stages of our project we will involve the community in different ways.

  • At the beginning we will invite our community stakeholders for input on the best ways to collect and format our dataset. This will serve as a guide for user-stories in software development.
  • During the period of data being collected and shown, we will ask the community for advice in ways that most convincingly present the data on the website, and to re-use data-dumps.
  • At the end of the collection period we will ask for peer-review of our final report as we write in an "open science notebook" fashion.

Fit with strategy[edit]

As the saying goes, "what gets measured gets fixed." As we will be providing metrics on the biography gender gap, we believe that will be an impetus to expanding biographical coverage and improve quality. This will accomplished by nudging current editors, or by involving new editors.

Sustainability[edit]

Considering that the project and statistics are proposed to be automated, the project, as a website and dataset will continute to generate after the grant ends. Of course some small amount of maintenance is required to keep an "automated" process running, yet the website and machinery will be hosted on Tools-Labs, and thus open-source and more easily community supportable.

Another way which we expect the project to potentially be grown after the grants is to have more measurements included. For instance the Grants:IdeaLab/Examination_of_gender_in_biographies is already a similarly dervied measure that could easily fall under the umbrella of WIGI, and be generated and displayed automatically along side it.

Measures of success[edit]

Need target-setting tips?

We are essentially producing statistics, therefore our main interest is in the viewers and usage of those statistics. Some baseline measures of success are, at the end of the yearlong grand:

  • A website with self-updating graphs and data downloads.
  • 1,000 pageviews per month on our statistics website.
  • 100 data downloads per month of our dataset.
  • 5 data re-use cases of our dataset.
  • 1 report on the statistical significance of our data in a multi-scholar paper.

In fact in some way WIGI is a measure of success for other projects, so if other inspire Grants were to point to us as a way of measuring their work, that would be a very positive data re-use case.

Community Notification[edit]

Please paste a link below to where the relevant communities have been notified of this proposal, and to any other relevant community discussions. Need notification tips?