User:Cormaggio/A small scale study of Wikipedia
This is an essay completed as part of a Masters of Education in Communication, Education and Technology at the University of Manchester, U.K. Please leave the content as it is, but I would really welcome comments on the discussion page or on one of my user talk pages - probably this one is best.
Cormac Lawler (User:Cormaggio on various projects)
Submitted: 24th January 2005
See also the questionnaires which this essay is based on in the Appendices
Part 1: Introduction
Wikipedia (http://www.wikipedia.org , http://en.wikipedia.org/wiki/Main_Page) is both an online collaborative encyclopedia and a community of people in the process of creating it. Launched on 15th January 2001, it has grown to contain more than a million articles in 160 languages, from Abkhaz to Zulu, though there are a great many more articles in production, and more language wikipedias waiting to be developed (MT:comp). Alexa.com, an internet search engine places Wikipedia in the top 100 English-language websites in terms of visitor traffic, and it is also vying for second biggest “dot-org” (.org) site worldwide (Snow, 2005, WP:sign, Alexa.com). While there is some research done on Wikipedia (though little academically written, and even less so in English), I have chosen to undertake a small scale inquiry on what has made Wikipedia so successful and which will give an introduction to both the website itself and the people who make it. My research question, therefore, is “According to its users, what are the main factors both contributing to and challenging Wikipedia’s success?”
But firstly, I feel I must explain something of the history of Wikipedia and a few of its main principles and modus operandi. Wikipedia was born from an earlier attempt to create a peer-reviewed encyclopedia, called Nupedia, which was to be open and free to all. Nupedia did have some success, but those involved (particulary Jimmy Wales and Larry Sanger, though there were others) felt that the process was yielding too little product. This prompted the idea of using wiki software to promote participation; “wiki”, from a Hawaiian word meaning “super quick”, is software which is fully editable, originally created and promoted by Ward Cunningham on his website in 1995, and it also refers to a website using this software (Reagle, 2004, WP:wp). At that time, after a year in existence, Nupedia had 23 completed articles; Wikipedia on the other hand began to grow very quickly, counting over 200 articles within a month, 800 by two months and over 3000 in six months (WP:stat). The potential of Wikipedia was apparent from the beginning; Larry Sanger writing in July 2001 called it a “roaring success” and predicted that Wikipedia would have replaced Encyclopedia Britannica within 10 years (Sanger, 2001). At the then current rate of growth (though he miscalculated the actual number of genuine articles) he predicted that Wikipedia would have 84,000 articles in seven years; in actuality the English version alone has well over five times that (450,166, or over a million, including discussions and very short articles) as of 15th January 2005, its fourth anniversary (WP:stat). It is now run by a non-profit organisation, called Wikimedia, which also encompasses other ‘sister’ projects, like Wiktionary (a dictionary), Wikisource (a repository of source materials like census figures, speeches, essays etc.), Wikibooks (a collaborative book-writing venture), as well as a ‘Meta’ site, devoted to overarching issues within these and other different projects.
There have been, since the beginning, a number of basic principles under which Wikipedia functions. These are:
- that articles are written neutrally,
- that original research is banned,
- that discussion on articles is kept separate from the articles themselves,
- and that content is both ‘encyclopedic’ and free to be used under the GFDL (Gnu Free Documentation License, see http://www.gnu.org/licenses/fdl.html)
(WP:wp, WP:nor, WP:npov)
As already stated, Wikipedia is fully editable (with very few exceptions) by anyone, which means it is susceptible to vandalism on a fairly regular basis, eg (WP:van). The idea was questioned by many and even Jimmy Wales himself, founder of Nupedia, co-founder of Wikipedia, and chairman of Wikimedia, admitted as much in 2001 that the idea seems ridiculous but that it actually works (Wales, 2001). The way it works is by having a large number of people who keep track of recent changes, often through watchlists, which notify the user whenever a page they have marked has been edited. This way, vandalism is picked up remarkably quickly; research by IBM found that most vandalism on the site was erased within five minutes (Waldman, 2004, WP:wp)
I will now proceed then to my own personal research. I have said that that my question revolves around what the “users” of Wikipedia think (often referred to as “wikipedians”) and to this end I have asked a number of users to answer some questions on their experience of Wikipedia, as well as myself referring to the website itself which is, after all, the work and experience of its users. I am also aware that the concept of “success” that I have used is rather nebulous, so I will be looking at both the process, ie. what keeps Wikipedia ticking and also the product, ie. how is it, or will it be, used, and what is the quality of this use. Finally, I feel this is the appropriate place to say that, due to the way modern language and the internet has developed, there is quite a bit of jargon involved which I will try to explain throughout, but that can be looked up in the glossary that I have provided.
Part 2: The study
Because of the large, diffuse nature of Wikipedia, the only realistic mode of communication is electronically and I therefore decided to make and distribute a questionnaire by email. I wanted descriptive answers that gave me a feel of their overall experience so I chose to make my questions open-ended, and I deliberately kept these to a minimum since I didn’t want to put anyone off with a long questionnaire. I left questions fairly open so that participants could answer in as much detail as they felt appropriate. The data, therefore, is qualitative, and I have approached it from a Grounded Theory perspective, though I have looked at some quasi-statistical outcomes as well. Parallel to this I also did a great deal of exploring within the site itself, learning about policy, structure and guidelines, and reading discussions, both on discussion pages in Wikipedia and on its internal mailing list.
I admit to feeling slightly lost when I first thought of studying Wikipedia; I felt like the person that Miles and Huberman pick up at the airport and ask after half an hour what s/he thinks of the country (Miles, Huberman 1994) – in a word, overwhelmed. One of the features of Wikipedia that I found most helpful in starting off was the ability to check who have been the most active users there and to check each user’s user page. My idea was to sample for diversity; by gender, age, nationality, and even range of interests, since the community is so widely spread in so many respects. I also was looking for both people who had been active since the beginning and those that had just joined in as well as people who were no longer active or who had officially left. (This latter aim proved difficult to achieve due to out of date email addresses or possible lack of interest.) Most notable in this respect is co-founder Larry Sanger who left the project in February 2002 but who still maintains an (outsider) interest in the ethos and direction of Wikipedia. I should say at this point that I sent both Sanger and Jimmy Wales a different questionnaire to the one I gave others, focussing more on why decisions of structure and policy were taken at the beginning, but that, despite both agreeing to participate (with some reservations), neither actually did, a significant loss to this study.
So all the people who responded are currently considerably active, and were selected from the English Wikipedia, though some may be active on others. My process of selecting potential participants was quasi-random; I cannot say absolutely why I chose someone as opposed to someone else, but that I was interested in diversity and so favoured people from differing backgrounds, as above. When I had selected a number of users, I sent them an initial email to ask if they would participate and then normally a standard email with questionnaire attached. The response rate was encouragingly high; 10 returns from 15 questionnaires (including both Wales and Sanger, and one other who said he might participate depending on time) sent from an initial 24 emails. One more found my questionnaire through another participant, making 11 responses in all.
Of these responses, there is a wide geographical spread - from Australia, Belgium, the Netherlands, Poland, South Africa, UK, and USA, though some may be living in other countries. There is a range of interests as can be seen from each participant’s user page, as well as languages spoken and fields of work/study. There were more males than females involved in a ratio of 8:3, though I am assessing this by name and cannot be fully sure. In fact, this was a basic fault with the design of the questionnaire in that I neglected to include any questions about gender, nationality and field-of-work; all my knowledge here coming from participants’ user pages, apart from the participant who contacted me through my research page in Wikipedia, who volunteered this information according to a suggestion I made there.
Threats to validity:
The most obvious threat to the validity of the study is the sample size; eleven participants out of a community of tens of thousands cannot be seen to be fully representative. I did, however, specifically look for people that were (or looked from their user pages) different from one another. I looked at their edit histories, and read their biographies (if given) and tried to choose people who were widely active, and who I could see were contributing to both discussions and articles. Initially, I thought about asking those that agreed to get involved to ask others to participate, but rejected this, as it would undoubtedly influence both who got involved (the diversity of the sample) and the way they got involved (the quality of the data).
I am aware, however, that although participants might be different in terms of gender, age, field-of-work, country of orgin, and areas of interest, they are all active in this community, Wikipedia. Because Wikipedia has clear principles and ways of working, participants might be accused of “singing from the same hymnsheet” – repeating what’s already agreed upon. While this is the case to some degree, what I was interested in was the way participants describe their particular experience and the way this compares and contrasts with that of others. I also think that, given the differing, even sparkling, nature of the responses overall, they very much reflect experience rather than dogma.
I was confronted early on by an issue of ethics when it came to the question of how this study was to be used. While it is firstly a study for my M.Ed., I also envisaged it as being potentially of interest to Wikipedia itself and so had the idea of publishing it somewhere within the Wikimedia family. Aside from the issue of copyright, this did throw up the issue of whether what I was doing could jeapordise either my own study or the work or standing of the people who agreed to participate. Indeed, the first question I received from potential participants was “who will be reading this?” Since potentially the whole of Wikipedia would be interested, I decided it best to initially assure anonymity to all taking part, unless they specifically gave me permission to be named.
Significantly, though, it is interesting that one participant used this opportunity to upload his answers alongside some pointers he thought useful, to Wikimedia itself, going as far as to provide a downloadable pdf version of my questionnaire, and contacting the Dutch mailing list to generate further interest. I welcomed this development, though it immediately brought to my mind the possible violation of a core Wikipedia principle, that of ‘no original research’. It also forged a fairly close relationship between myself and the participant, something that some studies will try to avoid. I feel, however, that for this study, it is an inherent part of it to be involved in Wikipedia, and hence its community. I have a user page, where people can see my basic interests in this study, and also now have a number of related pages to this research itself, including to other pieces of research both in the past and ongoing. Indeed, one of the participants in this study speaks of his own research into Wikipedia from an economic point of view (Appendix 1J).
So I think my reflexivity throughout this study must be recognised. At first, I didn’t want to ingratiate myself too much with potential participants for danger of influencing their responses, though even my use of the word ‘wikipedians’ at the top of the questionnaire could be seen as taking an insider tack. But particularly after seeing my questionnaire on Wikipedia itself, I began to see myself both as an insider and an outsider in the ethnographic tradition. I know that everything I am reporting is being seen through my eyes and it is me who is representing the data and analysis. Overall though, my attitude to the community as a whole has been tentative at first; I worked with the people who I selected and who responded, and have tried to be as sensitive to the workings and language of the community as a whole, thereby acknowledging Flinders’ ‘ecological’ view of ethics (cited in Miles, Huberman, 1994).
Part 3: Results
So, to the participants themselves. Overall, for just eleven questionnaires, there is quite a variety of ideas; but there are also quite clear general patterns. I will be looking at these patterns but will also look at the particular language used as an indication of how the participants see it from their perspective and I have tried to keep it as in vivo as possible. Appendix 2 shows my coding system for the questionnaires overall and how I saw these patterns emerging from them. I will also extrapolate from these patterns; to both simplify and complicate the data (Coffey, Atkinson, 1996) - see appendices.
As would be expected, an immediately obvious theme is the openness of Wikipedia. This is the basic structure of any wiki site, and the starting point for looking at any of Wikipedia’s features or policies. The references to “anyone”, “everyone” and “anything” are numerous. It is, after all, how it all began; anyone can edit anything (except a few pages which are locked, or unless they have themselves been banned) and as another example of the growing open-source movement, creating software that is open and free for all. Therefore, it is the core category for my coding (Strauss, 1987).
Many participants acknowledge being perplexed at this feature of any webpage, let alone encyclopedia. As one participant said: “This seems counter-intuitive, and when I first saw the site I thought that it would never work” (Appendix 1A; questionnaires referred to simply by (letter) from here on). Another admits being “skeptical at first” but warming to it after comparing Wikipedia with other “reliable” sites (B). Another’s first impression of the project was “the audacity of creating an encyclopedia with it (which) appealed” (D). In fact, Jimmy Wales made this comment, to which I have already referred, not just about Wikipedia but the open source movement in general, and is worth reprinting as it gives a flavour of the beginnings of the project:
- “You wrote: "Sorry but the idea of a free encyclopaedia sounds ridiculous." Boy, it sure does! But I think that's part of what makes the whole thing so exciting. If you ask me, the idea of a free operating system sounds ridiculous. I mean it, too. Thousands of people working together from around the globe, with constant bickering and factionalism, creating a coherent useful stable operating system? Ridiculous!
- Except... it works! It exists! I think that's one of the most remarkable facts of human history. I see no reason why the same energy will not be inevitably harnessed for the creation of content.” (Wales, 2001)
That energy is still evident from the questionnaires and the website, and many share Wales’ enthusiasm. Many talk about the “fun” of it, particularly when they first arrived and say they got involved in Wikipedia through simply “playing around” (B). In fact, many like it so much that they find it hard to leave, and a major theme emerging from these questionnaires is that of addiction, again from the beginning. One says that, after writing an article (on exploding whales) “just to see what would happen”, he “realised how excellent totally unrestricted collaborative editing could be (and) was hooked” (A). Another paints a broadly similar picture:
- “I think the surprise of being able to edit had me hooked from the start. I’d expected to have my edit reverted but it wasn’t, and very soon I was welcomed and became aware of the community surrounding the site” (E)
But this addiction is not confined to just the beginnings of people’s experience on Wikipedia; there is a word coined to describe someone who has formed an addiction: a ‘wikipediholic’ (you can even take a test to see how far down the line you have gone). And though it is most often used with tongue-in-cheek, I have seen at least one person offering genuine assistance to others in need of a place to talk about it.
This, I think, just underlines the sense of community that exists within Wikipedia. Being welcomed is mentioned twice (E, H), as is helping others (C, E) (and I myself have been helped immensely by one participant in particular) and another talks of his “love (of) getting messages from (other) users” (A) to share perspectives. Message exchanges can get heated, though I have seen, during or after a conflict of opinion, instances of genuine compassion, apology and gratitude. And, even though it wasn’t mentioned in the questionnaires, apart from an etiquette of respect and civility which is encouraged, there is also a method of showing your appreciation for someone else’s work within Wikipedia – by awarding a ‘Barnstar’, coming from the concept of ‘barnraising’; a term referring to building a community structure with the help of all its members.
Of course, the collaborative nature of Wikipedia lives alongside and through this sense of community, or is more likely what creates it in the first place. The “shared goal” (of writing an encyclopedia) (E) is what binds it/them together or, as another puts it, the “joint end product” (H). Part of this collaborative process, then, is the running objective of achieving consensus in writing articles from a neutral point of view. Teams can be formed, and discussions take place throughout in order to meet this “most controversial” or “challenging” principle (A). Consensus comes up frequently (A, B F, G, I) but, interestingly, it is not always positive. One says that when consensus has been reached by a particular group of people “it is impossible for an outsider to contribute” (B), though mentions later on that discussion and consensus is important to keep the articles to a good level of content quality. The importance of the quality of each article contributed to is obvious and instances of particularly successful neutral collaborations are cited (A, F) as articles to be proud of; indeed, a great many users (not just those involved in this study) will cite their own personal favourite examples of what they have contributed to.
On this level of individual satisfaction, there are a number of varying themes. Firstly, as one participant says, it gives him an outlet for his writing (F); another likes to feel important, and consequently is more involved in a sister project, Wikibooks, where there are less people (I); and another likes to be able to educate other people about what he knows and his country’s history (J). I have definitely seen, though not just in these questionnaires, a certain sense of being able to ‘show off what you know’, as there is in many aspects of academia, it must be said, possibly even this essay! But more fundamentally, Wikipedia is seen as a chance to “give people a voice” (A), or to allow anybody no matter what their academic background a chance to “put in their bit of information about the world” (B), from “obscure interests, eg. local railway history” (K) to teachers adding content relating to their coursework (C).
But just as interesting to see here are the societal implications of Wikipedia. A commonly used word here is “anarchy” (D, H, K), though opinions are mixed; “(I like) the strange fact that anarchy can sometimes create beautiful results” (H) compared to “It’s also a reasonably friendly community… despite the anarchic nature of its setup” (K). Another describes Wikipedia as having a “less pyramidal power structure” than other forum applications (G), but he also goes further by saying that it is allowing for a “new way of making society (G, his italics). Another makes a similar point saying that Wikipedia has the potential “to become the core of an intellectual community in a manner independent of formal academia” (F). And another has found it a learning process of how to “communicate across cultural boundaries (and) how to retain respect within a online community” (E).
There are a number of debates within and between the questionnaires about whether it is a good thing to let people edit anonymously. Some think it is essential (A, C, I); others see it as having both pros and cons (F). There is the suggestion that, of those that log in but cause trouble, there should be a better way of really knowing who’s who – to make people “prove they are who they say they are” (F) in order to avoid one person having multiple accounts (A), undermining the process of voting, even though there is already a way to “track who contributes what” (B) within the software itself, ie the ‘history’ function. It is features like these that facilitate the process and safeguard the product – to “preserve work that vandals have attempted to destroy” (B), and most express dismay or even resignation at the existence and persistence of vandals and the inability to get rid of them quicker. However, Jimmy Wales has said in this context, (Waldman, 2004) and I think that there is implicit agreement here, that it is not anonymous users who create most problems, but those who try to force themselves on others, or “POV (point-of-view) pushers” (A).
So, it is precisely the openness of Wikipedia which can be and is taken advantage of, and it is this that needs to be ‘kept in check’, by the shared goal, principles like neutrality and ‘no original research’, and features like history, watchlists, and recent changes to be able to monitor what’s going on, as well as having some system of organised administration – system operators (‘sysops’) or ‘admins’ - which many of this study’s participants are. Indeed the participants’ enthusiasm in describing Wikipedia’s benefits is almost matched by their exasperation with vandalism – I say almost because it is clear that they spend a great deal of their time monitoring this and seem to do it out of a sense of altruism. And it is interesting to see their comparisons to other collaborative sites/projects that don’t have such clear safeguards or processes; among the long list of put-downs (see appendix 2, ‘Other sites’) are “troll infested”, “hard to navigate”, “non-serious”, “working in a vacuum” (blogs), and “no long term effects on anything”, though one does give some credit by saying that at least one of these sites (Slashdot) is aiming for something different and is “very suitable for what it aims at” (G).
But the main way that Wikipedia is compared negatively, both here and recently in the media, is around the issue of academic reliability. A former editor of Britannica wrote an article on this very point, questioning whether Wikipedia’s collaborative process really led to long-term quality (McHenry, 2004) and co-founder Larry Sanger also chimed in with a criticism of Wikipedia’s “anti-elitism” (Sanger, 2004). Whether or not the former is true - and many question this explicitly, one citing a blind test in Germany where Wikipedia came out on top (K) - it is clear that addressing these criticisms is one of the main challenges that face Wikipedia, both now and in the future. And it is one that is being addressed constantly, with researching and improving source citations ongoing (F), and in some cases breaking new ground in terms of translating or creating articles that weren’t available in English (and other languages) before (F, J). But fundamentally, it is clear that the whole project is still very much in process; many participants alluding to the fact that the project is just beginning and to “ask again in 20 years” (D).
Wikipedia is both a product and a process, with great success in each endeavour. What I have tried to shed light on here is the way that this success is achieved and sustained. I have hopefully shown that within the open, seemingly anarchic structure of Wikipedia is to be found a well-structured process of checks and balances. Overall, I get the impression that because of its quite radical way of working, it generates a lot of enthusiasm, but that the participants here respect and actively defend its working process. There are criticisms and suggestions, and I will be looking further at the way in which and to what extent Wikipedia is a learning community during the course of my dissertation. But for now, though the validity of this research cannot be seen to be concrete, I feel safe in concluding that not only is Wikipedia a success in numerical terms, it is also a substantial and active community, which, outside its open structure, is Wikipedia’s greatest asset.
Aigrain, P. (2003) The individual and the collective in open information communities adapted from a speech to the 16th BLED Electronic Commerce Conference, 9-11 June 2003 retrieved from: http://www.debatpublic.net/Members/paigrain/texts/icoic.html
Alexa.com Top 100 English websites on 15/01/2005 retrieved from: http://www.alexa.com/site/ds/top_sites?ts_mode=lang&lang=en
Coffey, A., Atkinson, P. (1996) Making sense of qualitative data London: Sage
Flinders, D.J. (1992) In search of ethical guidance: Constructing a basis for dialogue in Miles, M.B., Huberman, A.M. (1994, second ed.) Qualitative Data Analysis London: Sage, originally published in Qualitative studies in education, 5(2), 101-116
Hammersley, B (2003) Common knowledge originally published in the Guardian, 30/01/2003 retrieved from: http://www.guardian.co.uk/online/story/0,3605,884666,00.html
Kvale, S. (1989) Issues of validity in qualitative research Lund, Sweden: Studentlitteratur, in Miles, M.B., Huberman, A.M. (1994, second ed.) Qualitative Data Analysis London: Sage
McHenry, R. (2004) The faith-based encyclopedia originally published on Tech Central Station, retrieved from: http://www.techcentralstation.com/111504A.html
Miles, M.B., Huberman, A.M. (1994, second ed.) Qualitative Data Analysis London: Sage
Reagle, J. (2004) A case of mutual aid: Wikipedia, politeness and perspective taking retrieved from: http://reagle.org/joseph/2004/agree/wikip-agree.html
Sanger, L. (2001) Britannica or Nupedia? The future of free encyclopedias originally published on kuro5hin.org 25/07/2001 retrieved from: http://www.kuro5hin.org/story/2001/7/25/103136/121
Sanger, L. (2004) Why Wikipedia must jettison its anti-elitism originally published on kuro5hin.org 31/12/2004 retrieved from: http://www.kuro5hin.org/story/2004/12/30/142458/25
Silverman, D. (1985) Qualitative methodology and sociology Aldershot: Gower
Snow, M. (2005) Wikipedia moves into top 100 websites originally published in the Wikipedia Signpost 10/01/2005 retrieved from: http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/100
Strauss, A. (1987) Qualitative analysis for social scientists Cambridge: Cambridge University Press
Waldman, S. (2004) Who knows? originally published in The Guardian, 26/10/2004, retrieved from: http://www.guardian.co.uk/online/news/0,12597,1335892,00.html
Wales, J. (2001) Comment on (Sanger, 2001) on kuro5hin.org retrieved from: http://www.kuro5hin.org/comments/2001/7/25/103136/121?pid=22#29
From the Wikimedia network:
MT:clq Meta:Cormac Lawler questionnaire retrieved from: http://meta.wikimedia.org/wiki/Research/Cormac_Lawler_questionnaire
MT:comp Meta:Complete list of language Wikipedias available retrieved from: http://meta.wikimedia.org/wiki/Complete_list_of_language_Wikipedias_available
WP:corm Wikipedia:Cormac Lawler’s user page retrieved from http://en.wikipedia.org/wiki/User:Cormaggio
WP:nor Wikipedia:No original research retrieved from: http://en.wikipedia.org/wiki/No_original_research
WP:npov Wikipedia:Neutral point of view retrieved from: http://en.wikipedia.org/wiki/Neutral_point_of_view
WP:npobj Wikipedia:Neutral point of view – “There’s no such thing as objectivity” retrieved from: http://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view#There.27s_no_such_thing_as_objectivity
WP:van Wikipedia:Wikipedia (vandalised) retrieved from: http://en.wikipedia.org/w/index.php?title=Wikipedia&oldid=9296258 (Note: offensive)
WP:wp Wikipedia:Wikipedia retrieved from: http://en.wikipedia.org/wiki/Wikipedia (Archive version on January 24th 2005 at http://en.wikipedia.org/w/index.php?title=Wikipedia&oldid=9619636
See appendices for questionnaires, data coding and glossary of terms.