Talk:Mind the Gap
- 1 Problems
- 2 Future directions
- 3 Issues with identifying people's gender based on text style
- 4 Possible re-interpretation of conclusions
- 5 Wording differences on Wikipedia between male and female user pages
- 6 Female wikiHow users as a benchmark for female Wikipedians: Female Wikipedians use more male styles of writing than female wikiHowians
I know of at least 2 people who were provided the survey and are male in real life. I think if you want to have a survey that adequately about women, you can't have any men involved even if they pretend to be female online. I'm sure that many women would think that it would be a problem if the results were skewed in anyway because of people who are not actually women. It is one of the problems with allowing people to create their own identities and then trying to get statistics off of them. Ottava Rima (talk) 15:28, 7 November 2011 (UTC)
- It's possible for men to lie about their gender, just as it's possible for women to lie about their gender as they might want to do if they feel at risk of harassment. We cannot prevent people from lying, so we do the best we can and hope that the lies are rare enough and/or mutual enough that they are not significant for the outcome of the analysis. Pinetalk 06:54, 21 November 2011 (UTC)
Hi! This was great reading, and fits in nicely with thoughts I've had about why people will engage with Wikipedia editing. It works extremely well as a discussion paper, which I assume was the aim. In reading it, though, I had a few thoughts on how it could be further developed, so I'm tossing them here on the grounds that it seems like the right place. :)
My main concern for the first part is the reliance on the Women and Wikimedia Survey 2011. I think Ottava has a valid point, but more fundamentally, the 2001 survey primarily looked at those who self-identify as women. Thus the figure of 9% quoted is not in relation to all female editors, but only in relation to self-identified female editors. I don't know if this is the case, but it might be possible that more people willing to self identify belong to a particular group, and thus self identified female editors on Wikipedia are not a representative portion of female editors on the whole. (There's also an error in "lesbians account for 24% of the contributors", which contradicts the rest of the statistics). The other concern with the 2011 survey is that it also allowed participants to invite others to answer the survey. This will tend to skew the results, as people will tend to ask others whom they know, possibly through shared interests. Thus you might get an over representation of one or more groups because of a tendency for people within those groups to invite others who are similar, rather than those from outside.
Because of that, I'd try and drop the reliance on the 2011 survey - it serves to point to what sorts of questions a full survey could do, and it does that really well, but the methodology needs to be refined before the data collected can be used as more than a discussion point.
I'm also a tad nervous about the "57% of Wikipedia's female contributors are single", as it seemed like an interesting figure, but according to ABS stats, 49% of adults in Australia are not in a relationship. In conjunction with the age range of the survey, which included people from the age of 12 and with a medium age of 31, that figure seems unsurprising taken on its own.
More generally, and perhaps more fundamentally - the tool used here seems to be based on the assumption that authors have the opportunity to write in their natural style. The study was based on single-authored texts, not, as is the case of Wikipedia, collaborativly authored texts. So I'm curious about the extent to which writing style is influenced by the purpose of the writing (writing for an encyclopedia would have specific style requirements), and by other external factors such as the MoS. I'd love to see more about how that may influence the figures. Unfortunately, without knowing the extent to which that influences the results, the comparison with other Wikis doesn't provide a lot of data. But it does point to some interesting future research.
Anyway, just some thoughts. As mentioned, it is in keeping with what I would expect to find, and I presume that this was intended to spark discussion, hence my response. :) There's certainly a lot worth discussing in the research. - Bilby 06:28, 8 November 2011 (UTC)
- That's a good point about the proportion of "singletons" in the general population -- it is growing in Western countries. Maybe it would be more telling to ask whether or not female editors are raising children. I think that would give a better indication of how much available free time they have to do anything online, and how they prioritize their time. Food for thought. OttawaAC 23:46, 14 November 2011 (UTC)
Issues with identifying people's gender based on text style
Am I the only one who has reservations about this idea? I understand that large-scale studies have identified some words that are more associated with one gender, but how reliable are these 'author gender tests' really? I'd like to see some evidence that they are actually generally correct before accepting the results of this study at face value. The fact that they apparently identified all of the articles on the GeekFeminism wiki (a wiki with a large, if not overwhelming, proportion of female contributors) as written by men raises doubts about their reliability. I'm inclined to think that these tools are not measuring gender at all, but rather style/tone of writing, and the words they identify as 'male' are associated with a more encyclopaedic tone. So it's no wonder Wikipedia scores highly on such measures. Robofish 23:58, 16 November 2011 (UTC)
- The very idea smells sexist, indeed. Measuring style/tone can still be meaningful and useless, though, as the page notes. Nemo 09:14, 17 November 2011 (UTC)
- What struck me is the revelation that other wikis with much greater female participation rates -- the Geek Feminism wiki in particular -- actually have comparable writing styles (under this metric) to Wikipedia. I'd like to see a larger sample size from each Wiki, of course, but with the results given here, it tells me that the writing style gender assignments are either fatally flawed, or inherent somehow in the nature of information writing. LtPowers 15:36, 20 November 2011 (UTC)
- The importance of other wikis, for me, was these other wikis are just like Wikipedia: They require the main text be written using a male writing style BUT these wikis still attract a large female editor base. Ultimately, for me, when coupled with the user page information, they confirm several things that Wikipedia's female population: 1) Wikipedia's female editor is not representative of female writers in general, 2) Wikipedia's existing female writer base may actually be harmful to the goal of reducing the gendergap because of how they participate and literature that suggests some of these women may drive off other female participation, 3) "The nature of Wikipedia using male information styles" cannot be an excuse because other Wikis can write in male styles and attract large amounts of female participation. --LauraHale 19:27, 20 November 2011 (UTC)
Possible re-interpretation of conclusions
This article concludes that Wikipedia attracts women who write like men (also, can we perhaps use the term women and men, rather than males and females? the latter two are used to oppress trans-gendered men and women and that is just ht beginning of the problem). My interpretation of the conclusions would be more that the edits which are allowed to stand, and the women who are accepted as part of the Wikipedia community are more likely to read as masculine. I suspect feminine writing is more likely to be challenged on NPOV, and that the sexism in online communities is likely to be more acutely felt by women who write using a feminine style, and as such I suspect Wikipedia is the cause of the masculine writing style, rather than its effect.
- We chose the terms male and female, instead of men and women, because there are people under the age of 18 who edit Wikipedia. We felt that the terms male and female were more accurate. Pinetalk 06:58, 21 November 2011 (UTC)
I also see many ways to interpret the conclusions and would like much more detailed discussion of them. There is a logical circularity to saying that you can, on the one hand, distinguish male and female writing styles, but on the other hand, women on Wikipedia write like men. Starting from the resulting data, one might conclude more simply that there are not pronounced gender differences in Wikipedia writing, even if there are differences in formal BNC writing, which is certainly neither a logical nor empirical absurdity. (That does not mean I disagree with your interesting inference in a different direction; you may well be correct, but I think substantially more support is needed.) The fact that the sample from which you draw your base observations about gender difference (formal writing from the British National Corpus) is very different from the user pages and/or a subset of gender-related Wikipedia pages is also troubling. The fact that Wikipedia pages are collaboratively edited, as the user above suggests, whereas BNC articles typically have much more traditional editing styles, also suggests that you may be comparing apples and oranges. I don't see a strong effort to distinguish between edits and original text contributions, for example.
This is an important contribution to an ongoing discussion that is widely covered in the academic literature; I would like to see a much closer integration of your methodology with current techniques of corpus analysis, including submission to a peer-reviewed journal, where you will receive more feedback from corpus linguists who can guide you to additional resources, methods, authors, and experiments that bear on your question. As presented now, there are too many unchecked assumptions and inferences without methodological backup to draw any strong conclusions from your obviously interesting data and work. I appreciate Wikipedia's DIY ethic, but I am not clear how the persistent side-stepping of existing research strands, and reliance on just a few (if important ones), helps your audience to have confidence in your results. Have you, for example, run the paper by Shlomo Argamon, Moshe Koppel, Jonathan Fine, and Anat Rachel Shimoni, whose methods and research you are relying on heavily? I think they would have a lot of advice and support for your efforts. [[[Special:Contributions/220.127.116.11|18.104.22.168]] 17:14, 20 January 2012 (UTC)]
Wording differences on Wikipedia between male and female user pages
This isn't a fully processed thought, but rather a data sample based on what we've already done (and that wasn't included in the "paper" largely because it was getting unwieldy and there didn't appear to be much added value to putting this into the article beyond YUM! TASTY DATA! YUM!). Using the original list of self identified male and female contributors from user boxes as in the original paper, there were 867 females represented in the female data set, and 568 males represented in the male data set. For each data set, we ran a program that counted the total number of occurrences of all words for all females and all males. Example: Female 1 uses the word pornography 5 times, baseball 2 times and love 15 times. Female 2 uses the word pornography 0 times, baseball 20 times and love 6 times. Total for all female scores: pornography 5, baseball 22, love 21 times. All words with 3 or fewer letters were removed from the list. All words with _ in them were removed. A number of words that implied html or wiki specific coding were removed. These included words like table, alignleft, alignright. (Not all were removed, and some may have remained. Important to remember, all these words that were not included were not included for both genders.)
Women had 10,871 unique words they used. Men had 8,933 unique words they used. The table below is the top 100 words used by according to each gender.
|Female word list||Female Use count||Female word rank||Male word list||Male use count||Male word rank|
Is there a lot of difference between ranking of words on these lists which suggest different patterns in gender? 77 words appear on both lists for the top 100. They are: that, wikipedia, have, with, page, about, from, like, image, userboxes, also, articles, white, wiki, user, more, info, here, navframe, just, will, time, other, some, small, when, article, been, family, school, people, know, there, music, things, nbsp, english, middle, your, interests, display, what, currently, pages, cellpadding, please, auto, editing, hello, most, navhead, talk, navcontent, university, information, which, work, good, since, they, would, only, than, contributions, history, life, list, make, years, serif, barnstar, ffffff, first, live, welcome, them.
Words that appeared on the female list of top 100 words but not the male list include: love, much, because, well, really, read, favorite, normal, very, help, many, should, wikimedia, feel, something, born, face, find, sans, anything, writing, reading, student, interested, want, think. These words may appear on both lists anyway. For example, born is ranked the 90th most used word for women and the 103rd most used word for men. Help ranks 118 for men, and 74 for women. (Helpful is the 1192 most used word by women, and 1288 for men. Helping ranks 603 for men and 707 for women.) Without doing a lot of math, this data appears to confirm the conclusion for other data that male and female word usage on Wikipedia does not differ substantial. It appears highly unlikely that a word list could be used to identify the gender of Wikipedia users based on the text they write against a body of work based on known gendered Wikipedians. --LauraHale 04:42, 21 November 2011 (UTC)
And because the data was sitting around… a comparison of the top one hundred words used based on the users and there stated gender/program identified gender.
|Stated F identified F Count||Stated F identified M Word||Stated F identified M Count||Stated M identified F Word||Stated M identified F Count||Stated M identified M Word||Stated M identified M Count|
Here, there appear to be a few differences. --LauraHale 07:17, 21 November 2011 (UTC)
Female wikiHow users as a benchmark for female Wikipedians: Female Wikipedians use more male styles of writing than female wikiHowians
I used the user boxes on wikiHow to create a list of wikiHow users by gender. I then ran their userpages through the same analysis I did for Wikipedia users where we knew their gender based on user boxes. Basically, the same methodology described on the main page, only switching sites. There were 256 unique users who had gender boxes identifying them by gender. When scores were removed for people scoring the same such as 0-0 (and effectively being gender neutral or undetermined), there were 231 users left, with 25 having been removed. The results are here:
|Wikihow Identified Gender||Male||66||28.6%|
|Program identified gender||Male||131||56.7%|
|Correctness||Yes - Female||81||49.1%|
|.||Yes - Male||47||71.2%|
|.||No - Female||84||50.9%|
|.||No - Male||19||28.8%|
We're dealing with a much smaller sample from a site that has 43% female participation but an overall smaller contributor base. They have a greater percentage of females identifying with userboxes than Wikipedia does. (Suggests to me that women feel even more comfortable expressing femaleness there.) Beyond that, the program correctly identifies women 11% more often than it does on Wikipedia. 11% in this case seems significant to me. I think we can assume some bias for male writing nature in both spaces because of the factual nature conveyance for both… but Wikipedia's population and WikiHow's should match up if they were attracting similar types of females… and they aren't because there is that 11% difference. Beyond that, the program correctly identified male users within 1% (both roughly 71%) for wikiHow and Wikipedia. This says to me the two wikis have unique groups of female users as their characteristics are not the same. Wikipedia's females use male language much more in their personal space than wikiHow's female contributors.
Also worth noting, the female users appear to have their male/female word count points grouped more closely together than their male counterparts. Difference in STDEV for females between male and female scores is 77. For males, the difference is 188. Difference for women for mean between the male score and female score is 44. For men, it is 135. I'm guessing that we have a situation, where when plotted, the men would be less close to the gender neutral line than the women would be. Better idea of the users with in the group…
|STDEV||Female score||Male score|
|MEAN||Female score||Male score|
|MEDIAN||Female score||Male score|
|MODE||Female score||Male score|
All this new data supports the methodology being valid. You may not like the study about gendered language and the word lists, but they developed a method of determining different patterns of language usage between genders. The wikiHow data really ,really supports the validity of it. Wikipedian female users over representing as male supports the supposition that female Wikipedians are much more likely to use male coded language and they are not representative of the wider of females. There is no real valid place to critique the methodology as flawed, because the distinct groupings validate it, and the wikiHow data is just the icing on the cake. --LauraHale 11:10, 21 November 2011 (UTC)