Grants talk:IdeaLab/Research gender affinity for different subjects on Wikipedia

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Privacy considerations?[edit]

"Take samples of Wikimedia editors who have disclosed their gender (eg. through language preferences) and analyse their contribution histories to see if there is a correlation between the editor's gender and the articles they edit (in terms of gender affinity for that article's subject matter). If we had a numeric "gender affinity" measure for each article this would be a simple correlation to run."

I have no doubt that you could do this, but do you really want "Wikipedia editors know you're a girl, even if you don't say so: what this means for free content online" on a news site? Or "Creepy wikipedia nerds use algorithms to find girl editors to stalk" on Jezebel?

Thanks, this is a good catch and worth raising. You're right that there are privacy implications to this, if used incorrectly, and associated PR risks if people choose to focus on that aspect of it. It reminds me somewhat of tools like this which purport to guess gender based on writing style (usually word frequency).
Just to be clear, though: the research I suggested was to study people who have already willingly disclosed their gender and see what their editing patterns are like. While I haven't proposed anything that would guess the genders of people who haven't disclosed gender, I do recognise that the underlying data could be used to build a tool that could do that.
I guess some of the questions I'd throw back are: would research into gender affinity of topics give greater insight into undisclosed gender than eg. writing style, and thus be a really juicy target for misuse? If this gender affinity data (by article/subject) is based on public information which we make more easily accessible, what level of responsibility do we have for potential misuse of it by third parties in future? Where does this fit into the bigger picture of eg. ad networks, social media, and search companies collecting gender/demographic data from online behaviour? Has the Wikimedia research community previously dealt with data that might be used to disclose gender and if so how did they deal with it? --Skud (WMF) (talk) 11:31, 11 March 2015 (UTC)[reply]
Good point re: style analysis. Didn't consider that. I'd prefer to ask: how accurate does the algorithm have to be before it's a problem? Because if you said 60% or higher (given base rate of 50%; I know the trivial algorithm is already ~80% accurate), the fact that you can get 75% from somebody's tweets means the cat's out of the bag. I scrape your edit history and run it. Boom.
Also, good point bringing up the ad networks. Ironically, it raises the question of whether it wouldn't be better to just buy this data from the privacy violation industry directly. It'd probably allow you to determine article-gender affinity pretty well. I'm not proposing this, I just think it helps you see the issue in perspective. Dingsuntil (talk) 11:01, 12 March 2015 (UTC)[reply]

How do you then productively use this research?[edit]

Here are the two extremes of what you may discover:

  1. having "more women on Wikipedia [who act like the women who are currently on Wikipedia] would mean more coverage of certain areas", or
  2. you discover that more women on Wikipedia would likely mean the same coverage as now.

Given scenario 1: how do you then get to a state where there are "more women on Wikipedia"? Or given scenario 2, same question. Pengo (talk) 02:53, 12 March 2015 (UTC)[reply]

Good questions! I don't think this research will directly increase the number of women editing; instead, I think it will help provide measures for the cost to Wikipedia of not having enough women editors, and for the success of other efforts to increase women's participation. Let's say that your first extreme occurs: we discover that there is a high cost to not having women's participation, we take efforts to increase women's participation, and can then measure the impact of that based on the improvement in content. In the second scenario, we find that there's no strong gender difference wrt interest in subject areas, and that women don't tend to edit different subjects (on average) than men. (I have to say, btw, that I consider this less likely than scenario 1.) In this scenario, we've learned that increasing women's participation won't improve content in that particular way, so we can avoid working on/funding further projects that are based on that assumption. This saves us time/money and lets us be more effective in our other work. --Skud (WMF) (talk) 00:06, 16 March 2015 (UTC)[reply]

Another angle to answer the same questions[edit]

Another way of determining this (not sure if it would be easier or harder) would be to look at whether articles in certain subject areas tend to be disproportionately edited by users who self-identify as women in their user settings or in user categories. You could see on an individual article basis if articles that are edited more by women tend to be in worse condition, or do the analysis on a category-by-category basis. Calliopejen1 (talk) 18:59, 25 March 2015 (UTC)[reply]


A potential for collaboration? --Mssemantics (talk) 20:37, 25 March 2015 (UTC)[reply]