Wikimania05/Paper-AL1

From Meta, a Wikimedia project coordination wiki
This page is part of the Proceedings of Wikimania 2005, Frankfurt, Germany.


Global Wikipedia. Communities of Langues & Culture[edit]

  • Author(s): Amruta Lonkar, Marcela Musgrove Chavez, Selim Sermet Akbay
  • License: TBD
  • Slides: {{{slides}}}
  • Video: {{{video}}}
  • Note: paper only

About the author: Amruta Lonkar is studying Human Computer Interaction at Georgia Tech and is currently working on a very cool project which involves use of the Wikimedia software. She is a firm believer in Lawrence Lessig's school of thought. Marcela Musgrove Chavez is a Masters of Science student of HCI at Georgia Tech. Selim Sermet Akbay, TK.

Abstract

{{{abstract}}}


Abstract[edit]

In this paper, we look at Wikipedia in terms of an online community and compare the English speaking Wikipedia to Turkish, Spanish and Hindi versions of Wikipedia.

This paper was accepted for Wikimania 2005 but not formally published until 2023

Paper[edit]

Introduction[edit]

In the modern context, the word “community” alludes to much more than the idealized focal members of a 1950’s small town. The notions of locality, common governance and exchange of identity become less important while interaction, shared experience and common interests light up to broaden the definition of community. This is especially valid for online communities, where the communication medium is the computer. Any typical internet user is labeled with many colors of community membership, blessed with the fact that distance is the least of the obstacles for being involved, as opposed to real life. As Barry Wellman pointed out, “Enthusiasts hail the net’s potential for making connections without regard to race, creed, gender, or geography” [1]. Furthermore, online communities give people the opportunity to define alternative identities that are not restricted by the physical and social moldings in real life [2]. Yet, most online personalities are highly integrated into real life due to the increasing –even invasive- use of computer media as a significant form of communication in daily life.

In this paper, we are conducting research that examines the practices of a specific global online community: Wikipedia. This study is part of a larger class-wide project that seeks to study the general characteristics that a broad range of online communities share, which will result in a deeper understanding of online communities in general. Our special focus is on the global aspects of different language communities on Wikipedia. We start by defining the Wikipedia and why we are interested in researching it. Section 2 summarizes the history of Wikipedia and how it grew as a community. In Section 3, we describe the methodology we have used in studying the site and finding interviews, as well as information about interviewees. Section 4 depicts the background and demographics about the languages we have studied. In Section 5, we study Wikipedia in terms of the nine principles of community design. Section 6, in which cultural nuances in Wikipedia are discussed with quotes and case studies and Section 7 considers possible changes and future turns to the Wikipedia.

1. Overview of Wikipedia[edit]

The very obvious way to access the genuine definition of Wikipedia is the Wikipedia itself:

“Wikipedia is a Web-based, multi-lingual, ‘copyleft’ encyclopedia designed to be read and changed by anyone. It is collaboratively edited and maintained by thousands of users via the wiki software, an opensource program first started by Ward Cunningham, and it is hosted and supported by the non-profit Wikimedia Foundation. In addition to typical encyclopedia entries, Wikipedia includes information more often associated with almanacs, gazetteers, and specialist magazines, as well as coverage of current events.” [3]

There are 3 essential characteristics of the Wikipedia project: it is an encyclopedia, it can be edited by anyone -even anonymous users-, and it is free content using GNU copyleft [4]. The relative importance of these characteristics is found to be subjective among our interviewees (for a detailed discussion of interviewees, see Section 3). The main site policy can be summarized with the “neutral point of view (NPOV)” principle, which is committed to make articles as unbiased as possible by presenting all different views on an issue rather than writing from a single objective point of view. Yildirim defines Wikipedia as follows:

“[In Turkish] Ozgur yazilim gozuyle bakiyorum. Beni ilgilendiren birinci tarafi bu, ikinci tarafi bilginin paylasilmasi. Bilginin paylasilmasinda sorunlar olabilir, bilginin denetlenmesinde ozellikle sorunlar olabilir. Ancak yine de bir baslangic yapmak acisindan, insanlarin ortak akil dedigimiz akli kullanarak bir urun cikarmasi acisindan onem veriyorum ve resmi bilgi diye ifade ettigimiz yapinin biraz disina cikarak belki daha farkli birikimler veya bilgilerin bu tur ansiklopedilerde yer almasi gerektigini dusunuyorum.

[In English] I see it from an open-source[/software] point of view. It is this property that grabs me at first. Second, it is the way information is shared. There may be problems with sharing of information and especially in validating its correctness. However, I find it important in the sense that it is a good take on building a product using the collective mind of human race. And also I appreciate the diversity of experience and knowledge represented {in articles} such that they can present more than the official information {consensus}” His favorite characteristic is that it is a free content product based on an open source architecture. To him, Wikipedia is more than an encyclopedia, because it includes more than the experts’ view, it is a reflection of many minds and the way they work differently, yet able to provide a collective cognitive product. Murat emphasizes a different characteristic of Wikipedia:

“[In Turkish] Butun insanlarin katkida bulunabildikleri bir ansiklopedi. Cok fazla bilgiye erisebilecegin, hatta istersen senin de bilginden icine katabilecegin, surekli buyuyen bir ansiklopedi. Ve turlu turlu dillerde karsiligi olan yani turlu turlu dillerde yazilmis, kendi dilinde de katkida bulunabilecegin veya baska dilden aldigin birseyi cevirebilecegin imkanlarin da oldugu guzel bir bilgi paylasim ortami.”

[In English] An encyclopedia to which anyone can contribute. A continuously growing encyclopedia, which lets you access so much information and even lets you add your own knowledge. And it is available in many different languages, so you can contribute in your mother tongue, too. One can even use the tools provided to translate from one language to another. It is a nice information sharing environment.”

To Murat, the multilingual structure of Wikipedia is a very important feature to enable information sharing. Although he is fluent in both Turkish and English, and published his PhD thesis in English, he strongly feels the need for technical publications/translations in his mother tongue.

It is mainly these two characteristics emphasized by Yildirim and Murat that got us interested in Wikipedia as a research subject:

First, that it is a constructionist [5] environment, that starts empty from a set of construction tools and almost all content is generated directly by its users to total an astonishing collection of 1.3 million articles as of December 2004.

“We understand “constructionism” as including, but going beyond, what Piaget would call “constructivism.” The word with the v expresses the theory that knowledge is built by the learner, not supplied by the teacher. The word with the n expresses the further idea that this happens especially felicitously when the learner is engaged in the construction of something external or at least shareable... a sand castle, a machine, a computer program, a book”. [6]

Everyone, even people not registered with a user name, can edit the contents of entries in Wikipedia, which also brings in destruction. Although destruction itself is an effective tool in constructing better and bigger, it comes with a bitter taste of ignorance and vandalism. With so many different voices, it is very interesting that construction dominates destruction and conflicts can be resolved almost invisibly in Wikipedia.

Secondly, Wikipedia is available in many different languages. Out of 161 languages represented, 57 of them host more than 1000 articles. Most of the time, Wikipedia interface is also translated into that language, hence a user does not have to speak English at all. While the interface and concept stay the same across different languages, the content and the way people utilize it seem to exhibit interesting differences from one language to another. Figure 1 shows the international community portal for Wikipedia.

Figure 1: International community portal for Wikipedia

According to The Journal of Computer-Mediated Communication "In today's multilingual, global world, people are communicating on the Internet not only in its established lingua franca, English, but also in a multitude of other languages. Since the Internet began expanding globally in the 1990s, the number of non-English speaking users has grown to 470 million, or roughly two-thirds of all Internet users (CyberAtlas, 2003). To date, however, the research literature in English on computer-mediated communication has focused almost exclusively on emergent practices in English, neglecting developments within populations communicating online in other languages" [7].

It is a common definition that all internet users be considered as a global community. Although one may discuss the strength of such broadly-defined communities, the aspect of interaction on the web favors this proposal. However, communication requires language and as the above quotation states, two-thirds of all Internet users are non-English speakers. In this manner, Wikipedia proposes a delegated structure where many languages share the same architectural and interface structure hence one large community in one hand, yet on the other every language section has its own set of users and an individual community portal. From this point of view, Wikipedia is many communities under one roof. Even the concept of identity is independent between languages for users contributing to many at once. One interviewee, Andres, had been logging on to the English site for a while, then tried signing on to the Spanish version, only to get an error message “no existe” (doesn’t exist). He finds the incompatibility to be “un poco latoso” (a little annoying). There is also a third interesting characteristic which we have found out after studying Wikipedia in terms of the “Nine Principles of Community Design” [8]. The site designers get to do so little to have so much in return. It is only the toolset to be created for editing and managing content, but not the content itself. Furthermore, Wikipedia does not really go the extra mile to maintain a loyal community. The basic characteristic of the site revolves around the fact that anything and everything can be changed by anyone, which means everyone has the potential to either develop your contributions as well as to destroy all of them. However, many Wikipedians grow a strong link to the pages they edit, and this extraordinary link makes them come back and contribute/edit/correct gradually. “Watchlists” are the most important factor in defining these links, while “user contributions” pages serve as the representation of identity.

2. History of Wikipedia[edit]

Wikipedia was founded as a variation of Nupedia. Nupedia was an extensively peer reviewed online encyclopedia that lasted from March 2000 to September 2003. Although Nupedia was a free-content encyclopedia, the bar to become an editor was relatively high, in their words: “experts in their fields and possess PhDs”. Nupedia started Wikipedia on January 2001 (by Jimbo Wales and Larry Sangers) as a side project to allow collaboration on articles prior to entering the peer review process, and was also favored by GNUPedia advocates, another project similar to Nupedia. As Wikipedia grew and attracted contributors, it developed a life of its own and began to function largely independent of Nupedia, which ended up deserted and shut down on September 2003. By then, most of Nupedia’s content was assimilated into Wikipedia. [9]

This is a very interesting story because it features the power of community and technological tools to transform a concept. “Encyclopedia” being a totally paper based entity before the CDROMs, ended up in computers with advances in storage technology and leaped into free-content by the wide use of Internet. Although community potential was ready for a “anyone can edit anything” encyclopedia, it was the Wiki tool that made Wikipedia possible and successful by introducing an easy to handle medium for multi user reviewing [10]. (At this point, it would be nice to figure out which features of Wiki were added as responses to community needs. Were “history” -the tool that shows you differences in two versions-, “discussion”, “email to this user”, “user contributions” added later? How much of the improvements for dealing with increasing size and complexity were community driven? We have studied the release notes from SourceForge but, unfortunately the logs were not that deep. [11])

Murat talks about the paper version of encyclopedias, there was a time in Turkey when every major newspaper was giving away encyclopedias like Britannica to their subscribers, so every household ended up having 2-3 different encyclopedias:

“[In Turkish] {Bizim} evde de vardi 3-4 tane, oyle sus mahiyetinde duruyordu. … Wikipedia gibi kullanimi kolay degil, madde madde takip etmek emek gerektiriyor. … Guncelligini her zaman sorgulamak zorundasin. … {Ansiklopidinin icinde} Turkiye ile ilgili local seyler, sehirler mesela, vardi ama problemli konular, mesela Ermeni Soykirimi, gibi seyler bulunamazdi. … {Britanica} NPOV denemez, icinde politik seyler de olabiliyor, misal yabanci versiyonunda sozde Ermeni Soykirimi ile ilgili rakamlar her baskisinda arttirilmis.

[In English] We also had 3-4 {of paper encyclopedias}, they were usually nothing more than scenery. … They are not easy to use like Wikipedia, in order to follow up an argument you have to spend time. … You can never assume they are up-to-date. … There were articles about local things in Turkey, such as towns, but no politically problematic issues such as the claim for Armenian Genocide. … It {Britanica} is not NPOV, it has articles that are biased politically, for example in the English version the number for Armenian casualties related to the claim for Armenian Genocide increases from one edition to the next.”

The Wikipedia project significantly increased its number of participants after being mentioned three times on the tech website Slashdot [12]. Aside from these rapid fluxes of traffic, there has been a steady stream of traffic from other sources such as Google, which alone sent hundreds of new visitors to the site every day. The characteristics of these two sources are quite different. Surges of inbound traffic by articles such as the ones published on Slashdot do not necessarily produce users. Most of them are simply surfers, which will click on the link to Wikipedia in order to see what it looks like. If they like the content that happened to be there by then they may become regular users; otherwise, since they do not surf there looking for specific content they will not have a chance to appreciate the material. On the other hand, a surfer directed from Google comes to Wikipedia already looking for a specific article and will get a snapshot of the content from the Google summaries. Hence it is more likely that she will be satisfied and convert to a Wikipedia user. The downside is that she may end up using Google as her choice of interface and treat Wikipedia just like any other site that comes up in Google. Murat says that he uses Answers.com (another meta-search engine that gathers information from sources like Wikipedia, GuruNotes, certified blogs, CIA handbook, dictionaries, etc) [3] as an interface to Wikipedia, most of the time he does not have to go to Wikipedia just because its content is already mirrored in Answers.com together with many other similar sites. On the other hand, Yildirim, first encountered Wikipedia while he was searching for “Mafia” in Google. He announced it among his students in the campus, and many came to know Wikipedia thanks to him.

Wikipedia passed 1,000 articles around February 2001, and 10,000 articles around September same year. In May 2001, the first wave of non-English Wikipedias were launched in Catalan, Chinese, Dutch, German, Esperanto, French, Hebrew, Italian, Japanese, Portuguese, Russian, Spanish, and Swedish, soon joined by Arabic and Hungarian [13]. By late 2003, the non-English Wikipedias were collectively bigger than the English Wikipedia for the first time. The All-Wikipedia total of 350,000 articles was reached before the English Wikipedia total reached the 175,000 mark. The creation of the 1,000,000th article on the entire Wikipedia was announced on September 20, 2004. This enormous rate of growth is also a source of long debate in Wikipedia: Quantity of articles vs quality. One typical quote on a discussion page for Wikipedia statistics:

“… The fact that we are patting ourselves on the back for intentionally undercounting our articles is just plain silly. I just now went and looked under "short pages" at all 28 pages with exactly 100 bytes, and 13 of them contained a comma. Not a single one of them deserves to be called an article, but almost half are counted. Next I looked at all 33 pages with exactly 200 bytes, and 27 of those contained a comma. A few of them (not eighty percent!) might be considered articles under an extremely lenient definition of article, but does anyone outside of Wikipedia consider a single, brief paragraph to be an article? Are ANY of Brittanica's articles under 500 bytes?”[14]

3. Methodology[edit]

This study was conducted in a participant observation manner. All project members were active members of Wikipedia for more than 20 hours each during the course of this study. We all studied the English community and based on our languages of expertise, we also explored the Hindi, Spanish and Turkish Wikipedia communities as well as several regional communities related to our focus language. To get a better understanding of the working of this site we participated as both community members by taking part in various discussions that were going on and as explorers, exploring the interactions taking place between members, moderators and unregistered users.

The first step in our study was doing the community analysis of Wikipedia based on AJK’s community analysis template [8]. This analysis was done for Wikipedia as a whole as well as individually for each of the Hindi, Turkish and Spanish Wikipedias. We conducted online interviews to aid us in information gathering for doing this analysis. The readings and discussions done in class were also used heavily in doing the Wikipedia community analysis.

The next step in our study was recruiting participants for interviews. The sample base that each of us selected for this were members of the regional Wikipedias that we were studying. We reached the participants by individually writing mails to members, asking people on their individual talk lists, asking for volunteers on related mailing lists, and also by making our agenda known in our user profiles on Wikipedia. The interviews were of a semi structured form so that we could compare information obtained from various interviews and at the same time not have to follow a rigid format. The interviews were a mix of phone and face to face interviews. Some of these interviews were done in the regional language of the participant which were translated by us offline to English for uniformity purposes. Audio recording was done for all the interviews with the participants consent. Throughout this process we maintained complete participant anonymity and followed an ethical approach for information gathering. The research is conducted under expedited review by the Georgia Tech’s Institutional Review Board (IRB) [15] and all investigators have been certified for human subjects research.

Each project member conducted three interviews with members from the Hindi, Spanish and Turkish Wikipedia respectively. All names listed as well as some identifying details have been changed in order to preserve our informants’ anonymity. We have mentioned quotes from participants in this paper as told to us in the original language and followed by a translation to English where applicable.

Suhas, Mahesh and Niraj were our interview candidates on the Hindi Wikipedia. They have contributed to articles on both the English as well as Hindi Wikipedia. Suhas is a 26 year-old software professional, working in a big Software company in India. Mahesh is a 30 year old Post Doc in Management, studying at a premiere MBA institute in India. Niraj is a 29 year old Professor in the Computer Science Department of an Engineering college in India. All the three candidates were fluent in both Hindi and English and were located in India at the time of the interview. The interviews with Suhas and Mahesh were done in English while the one with Niraj was done in Hindi. All the interviews were phone interviews lasting for about an hour. The first interviewee on the Turkish Wikipedia was Murat, a fresh PhD graduate from an American university. He was born and lived in Turkey for 22 years and spent 4 years in USA. He is very fluent in English as well (he started learning English in 6th grade, which is quite common in Turkey, and he graduated from a Turkish state university of which the official language of education is English), but says that he is not very familiar with American culture; during his stay in US, most of his friends were foreign students like him. He gave an almost two hour face-to-face interview. The second one, Yildirim, is a 37-year-old Professor in Turkey. Although he was born in Turkey, he considers himself as a European, and has traveled to many places in Europe. He is fluent in English and has been designing websites related to his academic work. The third one, Orhan, is a university student from Western Turkey. He is 21 years old, and spent his life in Turkey. He is a very active internet user, yet does not have an internet connection at home. He connects from his university’s resources but mostly uses internet cafes (internet cafes are very common in Turkey, they serve computer terminals connected to the Internet through ADSL in addition to delicious coffee and Turkish tea). Yildirim and Orhan gave one and a half hour interviews over phone. All three have been active Turkish Wikipedia users for more than a year, Murat also contributes to the English site.

Juan, Carlos and Andres were our interviewees from the Spanish Wikipedia. Juan was a 42 yr old web designer in Lima, Peru. Carlos was a 24 year old Costa Rican graduate student studying for his math PhD. Andres was a 31 yr old designer from Mexico City. One interview was conducted face-to-face and the other two were done on the phone. All were in Spanish. Of the three, only Andres had actually written articles for Wikipedia, but all three used it as a source of information.

The last step in this study was analyzing and discussing all the interesting information that we had obtained in the realm of our observation space and looking for the similar and differentiating concepts that existed between different language Wikipedias. This was done by writing stories about our interview experiences, combining similar experiences and comparing the different experiences.

4. Background and Demographics About Languages[edit]

The Use of These Languages in the World[edit]

Hindi is the national language of India. It is spoken by about 500 million people around the globe.

Turkish and other Turkic language dialects are spoken over a very large geography, from western China to Balkans. A total of 223 million people speak Turkic dialects, the most popular being Turkish by around 91 million. From the 40 dialects of Turkic, 10 are represented in Wikipedia, each appealing to a unicultural audience. Only two of them are above the 1000 articles line (Turkish and Tatar) the rest having only a few tens of articles total. Tatar Wikipedia is an interesting example because only 6 active users created over 3600 articles. On the other hand, we limited our interviews to Turkish, because there is a relatively larger active user base (51) and 2395 articles as of February 2005. Active Turkish users of Wikipedia are mostly from Turkey, but a significant number of them are pursuing higher education abroad. In this manner, Turkish Wikipedia presents a unicultural environment. [16]

According to Wikipedia, Spanish is the official language of 21 countries with about 352 million people speaking Spanish as a first language. Additionally, it is often learnt as a foreign language throughout Europe and the U.S., bringing the total number of people to abut 417 million. The largest Spanish-speaking population is from Mexico (100 million), followed by Colombia (44 million), Spain (c. 41 million), Argentina (39 million) and the United States (c. 30 million).

The Use of These Languages in the Information Society[edit]

The internet usage in Turkey is concentrated around the younger generation, mostly university students. According to statistics there are around 4 million Turkish user on the internet and they make up 0.7% of the total internet users. Turkish makes use of Latin alphabet with the addition of six accented letters. Although it is common practice among Turkish users to omit the accented letters on the Internet, Turkish Wikipedia users are very strict about articles being written with accented letters. This creates some problems for users living abroad, since the original English Q keyboard layout is a little bit different than the Turkish Q layout. Murat, who is a Turkish PhD student living in US, says that he usually types his articles without accents and uses automated software to convert them into accented forms. Although occasional mistakes take place during conversion, a quick pass over the text can take care of those mistakes easily.

According to the statistics obtained the number of people accessing Internet in India in the year 2004 was 16,580,000. The age bracket of these users fells mostly within the range of 17-55. Mostly all of the users tend to have undergraduate level education as the user base consists mostly of students, professors and working professionals and a high percentage of teenagers. The Indian languages follow a Unicode format which is very difficult to edit online. The number of editors available online for editing and writing in Hindi is very less which makes it a time consuming process. Some amount of learnability is required for a user to easily write articles on Wikipedia. A large amount of help is also available on the site to aid in this process. According to statistics, there are 32.7 million Spanish-speaking users of the internet, making up 5.5% of all users [17]. A distribution of the Internet in Latin America gives varying degrees of connectivity ranging from less than 2% in Bolivia to 13-35% for Chile and Uruguay while Spain shows a 25-35% penetration [18].

The Use of These Languages in Wikipedia[edit]

The Turkish Wikipedia was initiated on December 2002, and passed the 2000 article line on December 2004. As of December 2004, there were 51 active users contributing to articles out of 1041 registered users. A significant percentage of these users are also fluent in English, hence they go search for an article in the English version first and only if they cannot find it they go to the Turkish Wikipedia. As a result, most articles in the Turkish version are related to culture, history or geography of Turkey. The Hindi Wikipedia started in July 2003 and has 1149 articles. Wikipedia provides with two softwares to generate the Hindi script. One is “ITRANS” scheme and the other is “Aksharmala”. It is very time consuming to write articles on the Hindi Wikipedia.

We learnt from our interviews that all the users on the Hindi Wikipedia were initially members of the English Wikipedia. According to their perspective there will never be a case where in a user is registered on the Hindi Wikipedia but not registered on the English Wikipedia. In addition it is also worth mentioning that the English Wikipedia was used more for gaining knowledge and contributing articles while the Hindi Wikipedia played the role of a magazine where readers ventured when they had some extra time.

The Hindi Wikipedia is in its infancy and is slowly evolving. This can be clearly seen from the kind and the number of articles present on the Hindi Wikipedia. A major chunk of the articles consists of the Indian culture, Indian music, famous cities in India and linguistics. Also the style of writing used in the articles is not very encyclopedic. There are very few technology related articles.

On the Spanish Wikipedia, there are 958 contributors, a little over 300 who are considered to be active (have edited at least 5 times in a month) [19]. However, the make-up is not representative of the Spanish speaking world in general. Almost all of the bibliotecarios (librarian, Spanish version of administrator), chosen from among the most active users are from Spain, except for two Argentinians, two Chilean and a “gringo” (non-Hispanic American). One of the bibliotecarios speculated that this was because more people from Spain were on the internet as compared to Latin America due to economic circumstances. Spanish speakers in the U.S. could be more likely to read and post in English as seen in the richness of posts on Puerto Rico that were unmatched in the Spanish version.

The Spanish Wikipedia is one of the top languages of Wikipedia, ranked #8. Among the users we interviewed though, it was still considered to be in a “catching up” position. The smaller number of active users means that many articles are not as well developed or edited as the English version. This is not always the case as shown by the existence of “Spanish translation of the week” section to translate Spanish articles into English. Most of these articles are culture specific on places or people, but recent articles nominated for translation because of their better quality in Spanish than in English included one on the Mars Pathfinder. Andres and Carlos often consulted the English version instead of the Spanish because of the larger number and better quality of articles. Andres contributed only in Spanish though, due to his greater comfort level for writing in his native language.

5. Wikipedia Inside Out[edit]

In this section we have analyzed the Wikipedia community based on Amy Jo Kim’s community analysis template [8].

Code of Conduct[edit]

It is the clear code of conduct that makes Wikipedia so successful. Wikipedia advocates freedom of speech and expression to all its members on the site but at the same time ensures that articles should be written with a neutral point of view (NPOV). All members are prompted to adopt a neutral point of view in all the contributions that they make to the site. This is more of a community principle that is followed by all the members when they do anything on Wikipedia even if it is just participating on the discussion forums or on the talk pages. The talk etiquette [20] also helps to set the mood throughout the site.

Mahesh gave us an interesting quote on whether people actually knew what the neutral point of view was:

“I don’t think anyone ever reads these community etiquette pages, they function just like EULA’s , no one reads them but everyone kind of has an idea about what it means and that it might have legal implications”

We have observed that the rate of construction of articles is much higher than the rate of vandalizing edits or destruction of articles on Wikipedia due the adoption of NPOV in all the articles present on the Wikipedia. Though it is a common phenomenon on English Wikipedia where in the members do stuff like janitorial services, typo correction, copy editing categorization and so on, Suhas mentioned that in case of vandalizing edits , when they are textual in nature they tend to get easily noticed and corrected however when users vandalize numerical values in articles then it is very difficult to verify the edit and sometimes is even difficult to notice.

Wikipedia strictly enforces the principle where in users are not supposed to advertise themselves or their work on the site, Yildirim talks about his experience with his contributions being deleted:

“[In Turkish] Ismimi, kisisel sayfami ve bazi makalelerimi akademik konumla ilgili bir sayfaya eklemistim. Amacim Turkce kaynak arayanlara yardimci olmakti. Sysoplar gorunce kaldirmislar, neden oldugunu da acikladilar reklama giriyormus, ondan sonra daha dikkatli oldum.

[In English] I added my name, personal webpage and some of my published articles to a {Wikipedia} article about my academic work. My goal was to help the ones looking for references in Turkish. Admins removed the links and explained me why: it was advertisement, I have been more careful after that.”

Wikipedia has a very user friendly basic wiki interface which gives it a higher usability ranking as compared to other sites. New users can easily learn how to write new articles without having to handle a lot of frustration or spending a lot of time. From the information obtained in the interviews Wikipedia members never had any bad first experiences. Suhas gave an interesting account of his first experience where he ended up making a potentially vandalizing edit by adding a reference to his paper however a Wikipedia user explained and informed him about the Wikipedia principles and told him that he should not be doing this. He said that there was a difference in the way that this was handled as compared to some other online site, where Suhas was informed properly why his article was deleted.

In Wikipedia, members are provided facility for creating discussion pages where in they can discuss about particular articles or even meet other members. In this study we obtained an important fact that, members of Hindi Wikipedia are all initially members of the English Wikipedia and in case of Unicode languages like Hindi, members used the discussion pages on the English Wikipedia rather than Hindi Wikipedia as writing in the Indic script was very time consuming. Murat talks about his contribution in a controversial article against Turkey: “[In Turkish] {Ingilizcesinde Turkiye ile ilgili konular}, sonucta bazi konular cok politik, oyle olunca insanlar kendi politik goruslerine gore birseyler giriyorlar. Sonucta Turkiye aleyhine cok fazla girdi olabiliyor. Ben de girdileri silmedim, {degistirmek yerine} tartisma kismina acikladim. … Ana kismi degistirmedim cunku ben degistirsem ertesi gun tekrar degisecek. O yuzden tartisma kisminda en azindan silinmeden duruyor.

[In English] Some subjects {about Turkey in the English version} are extremely political, people write their own political views into Wikipedia. As a result there may be so many negative things about Turkey. I did not change the main entry, instead explained myself in the discussion page. … I did not change the main page because the other day they will change it back anyway. At least my words stay put in the discussion page.”

His decision on using the discussion page instead of the main article not only makes his contribution more lasting but also sets up a warmer mood than simply replacing the main page contents. This behavior is typical on Wikipedia and is one of the main reasons it does not host flame wars like Usenet groups used to have. This rule, although not stated explicitly in the site, acts like an amendment to the code of conduct.

Orhan had a contradicting experience when he applied for an admin position once:

“[In Turkish] Admin olmak icin yaptigim girdilerden ornekleri o zamanki adminlere postalamistim. Adminlerden birisi yazdigim makalelerin bir kisminin kopya olduguna kanaat getirmis. Benim hakkimda ozel bir tartisma sayfasi acip diger adminlere fikirlerini sordu. Bi sure benim hakkimda atilip tutuldu, ben de cevaplar yazdim. Tatsizlik oldu bir hayli. Sonucta is tatliya baglandi ve o tartismalar silindi.

[In English] While applying for the admin position, I have posted some of my example contributions to the admins. One of them decided that some of my articles were simply copied over from other sources. He opened a separate discussion page about me and asked the other admins about what they think. There was some harsh language going back and forth about me for a while on that discussion page, and I wrote some answers. It was not pleasant. At last, we resolved the problem and they deleted those discussions.”

Purpose[edit]

As we have mentioned before , the purpose of Wikipedia has been to promote freedom of speech amongst its members in such a way that eventually members themselves become creators of content rather than just stay as passive users. In a way the audience itself decides on the potential members by the depth and scope of articles they edit. Orhan learned about Wikipedia from an email list he has subscribed as a part of his profession at the university. Some other user on the mailing list made a general call to add more terminology to Turkish Wikipedia specific to their profession. In this example, a subgroup in Wikipedia (Turkish speakers of a specific profession) recruited its workforce by itself.

From the information obtained in our interviews we can definitely say that all Wikipedia users even when anonymous always contribute to Wikipedia rather than just staying as passive users. Yildirim talks about his transition from “user” to “editor”:

“[In Turkish] Once anonim olarak okumaya basladim {Ingilizce makaleleri}. Daha sonra Turkcelerine baktim, katkilarim oldu, ama anonim olarak. Sonra sonra bir hayli anonim yazi yazdiktan sonra uye oldum. O zamandan beri kullanici adimla yaziyorum. Bu arada yazdigim makalelerin uzunlugu da degisti, oncekile cok detayli giriyordum, sonra zaman icinde kisaya dogru kaydim, biraktim diger insanlar ekleyip, gelistirsinler.

[In English] I first started with reading {article in English} anonymously. After that, I checked the Turkish site, made contributions, but still I was anonymous. After a while, I got a user name. Since then I have been contributing with my user name. One other thing is changed in the process, my first articles were very detailed, then in time, I wrote shorter and let other users add and develop them.”

His move from detailed articles to shorter ones actually let new users to start slowly and easily by developing his articles. These two stories by Orhan and Yildirim demonstrate the self initiation of purpose (principle #1-audience) and range of roles (principle #7) on Wikipedia.

Wikipedia also has a well defined mission statement which says:

“Our goal with Wikipedia is to create a reliable and free encyclopedia—indeed, the largest encyclopedia in history, in both breadth and depth.”[21] Yildirim tells that it was the free-content and open-source that got him hooked up in Wikipedia. He thinks that information is universal and it does not specifically belong to anyone. He says that information is a product of collective cognition and hence belongs to all humanity. He calls Wikipedia as a positive step away from individualism and a mission for the benefit of society.

We have asked the Turkish interviewees what they think about the rumors of Google being a part of / helping out Wikipedia. Both Yildirim and Murat are optimistic about this cooperation. However, Murat, brings up the fact that Google having being traded on stock market has responsibilities to satisfy stock holders; hence, it may not be able to apply the free-content principle. He also adds that he would not be intimidated by possible use of Google ads on the Wikipedia site. When I ask them if they would feel different in the case that it was Microsoft instead of Google helping Wikipedia, both respond sharply saying that they would not contribute in Wikipedia anymore.

These last two examples show us that Wikipedia has a strong mission statement which is also in line with its audience.

It is a common phenomenon nowadays to have colorful and flashy online sites. Wikipedia is unique in the way that they have never moved from their basic wiki structure. The front page of the site reinforces Wikipedia’s identity as a global, multilingual site. The design is very neat and simplistic, though sometimes due to the large amount of content present on the site navigation is difficult. Similarly, Andres, a designer, complained that the visual design was too simple, but he said he understood why since it was such a complex site. Governance in Wikipedia (Range of Roles & Leadership)

Wikipedia is one of the few online sites of its kind wherein though users have the option to add content without registering, most of the users are registered members with their real life information put in their user profiles including location and email. The power users are administrators (or sysops) and their powers are quite limited to deleting pages, protecting/unprotecting content and blocking/unblocking IPs. Administrators monitor each other and they are selected by nomination or self-nomination. Each nomination stays active for seven days, and voting is performed using discussion pages.

Other roles include ambassadors -who help with cross-lingual issues-, bureaucrats -who have the technical ability to assign sysops-, arbitrators, mediators, and higher order technical staff of stewards -who can adjust user rights on articles- and developers.

If there is a benevolent dictator in Wikipedia, he is Jimbo Wales [22] who owns the plug as well as the rights to abuse his god-like powers.

The government structure of Wikipedia is hard to categorize, it has features from anarchy to democracy, from despotism to technocracy. For a fun article about how Wikipedia represents a mix of all these governing elements, see [23].

Visitors in Wikipedia have all the powers a regular possesses. However users still prefer to register since when they put in a good amount of time to write articles they want to take ownership for it. As Mahesh points out: “You can’t stay a member of a community for long and be anonymous”.

Being anonymous also makes a difference in the way that people view one’s articles . If you know that it is a registered user or a well defined person who has done an edit to your article you do not tend to check for vandalism, however if you know its an anonymous IP/user you tend to make sure that it is not vandalism (known – blue unknown-red). This is in line with the quote from “Finding One’s Own from Cyberspace” by Amy Bruckman [24]:

“Comments from an anonymous entity are less valuable because they are unsituated.”

Suhas tells a story about the side effects of not being anonymous. Suhas felt that the article about ComputerCorp was biased in the favor of ComputerCorp and so he raised a point about it in the talk page. Suhas himself works for ComputerCorp’s rival ComputerLimited and he has mentioned this in his user profile. One of the other participants in this discussion page read this in Suhas’ profile and thought that Suhas was doing this on purpose to just start a conflict and that this was a ComputerLimited conspiracy. He mailed all this to the sysop and then Suhas had to spend a lot of time in giving a long clarification of how this was not true.

Two of Spanish interviewees had never contributed to the Wikipedia before. One had started writing an article on the Nazca lines, but never finished it. The other keeps on meaning to, especially when he saw the lack of Spanish articles as compared with the English version that he used more frequently, but as he pointed out “tengo ganas; lo que hace falta es tiempo” (I want to; what I lack is time).

Identity in Wikipedia (Member Profiles)[edit]

Wikipedia does a great job in terms of providing tools for identity. “User” pages (Figure 2) provide a customizable profile for members (even for anonymous members in terms of profiles linked to IPs). Additionally, an automated system keeps track of all contributions of each user and this list is accessible through “contributions” link in User pages. The only problem is that some users fluent in several languages are assigned a separate identity in every language even if they select the same username. The only workaround for this problem is to have links to all their profiles on each user page. Only in this manner can one track participation across Wikipedias; one could conceivably have a different username and reputation in many different communities with noone being any the wiser.

Figure 2: User page in Wikipedia

Gathering Places[edit]

Unlike other online sites Wikipedia does not have standard online gathering places such as chat rooms, lobbies, message boards, architectural structuring, or private personalized quarters. Instead, Wikipedia features special articles that can take place of such places. For example, “Community Portal” [25] pages (Figure 3) includes the correspondence of “public places” such as “Announcements” [26] and “Goings on” [27], as well as the “Village Pump” [28]. On the other hand, not many users know about these pages. Also Wikipedians, as Wikipedia members are called, make use of discussion pages of “User” pages to greet each other when someone new joins, thus utilize them like message boards. Wikipedia also has means to link personas to instant messenger IDs; however, only a small number of users are aware of this. [29]

Figure 3: Community portal for the English Wikipedia

Cyclic Events and Connections to Real World[edit]

Wikipedia do not have cyclic events such as celebrations or contents, but have events that keep a constant spirit. For example, “Collaboration of the week” [30] is selected by popular vote and users are called for participating on the selected article. One can even make a follow up and see the improvement. Similarly there are “Article Improvement Drives” [31], in which a non stub article is improved to a featured article. This is seen quiet often on the regional language Wikipedias like Tamil and Hindi which are still in their infancy stages. Since all the articles have to be written in Unicode, members actually have tried to collaborate and add more tools to the Wikipedia with the help of sysops so that better articles can be written. The real world connection is maintained by “meetups” [32]. They are organized by the “Village Pump” not only in cities of US but all over the world, including regular monthly meetings in Berlin and Cape Town. However, in person meetups are not yet started in countries like India where Wikipedia is still not very widely known, or is known only to a select group of people. It is a common phenomenon among the Indian Wikipedians that they already know each other before they meet on Wikipedia, as the publicity it receives in India is mostly word by mouth.

According to the Catalan Wikipedia, Jimbo Wales is planning a trip to Barcelona in mid-March and will meet with Wikipedians. This tidbit of information was passed on by a Spanish Wikipedia bibliotecario (admin) and was immediately responded to in Catalan saying that it would be a good excuse for a Catalan Wikipedia conference. Later on in the discussion someone suggested they invite people from the Spanish Wikipedia to increase the numbers. One active Spanish user had mentioned the two Wikipedias didn’t always get along, but didn’t elaborate. Based on our search, there didn’t appear to be much overlap among the more active users even though most Catalan speakers also speak Spanish. In Spain during Franco’s dictatorship, regional languages such as Basque and Catalan were officially banned. Since Franco’s demise, there has been a resurgence of regional pride reflected in insistence on using Catalan over Spanish, so the strength of the Catalan community compared to its small numbers would seem to reflect some of this pride.

Languages as Subgroups[edit]

Although languages can be considered as subgroups in Wikipedia, because of their limited number and scope this feature does not provide users to start customized sub communities at their will. The other structure is “Associations” for which any member can set up and run one. Associations call their members to vote on strategic decisions, however most associations are products of humor.

6. Culturability in Wikipedia[edit]

Stacey Horn’s ECHO [33] was very New York oriented and Howard Rheingold’s WELL [34] took on the flavor of San Francisco Bay area where it originated. Oldenburg’s nostalgia for “Third Places” [35], being in a neighborhood place where everyone was like you, did not necessarily allow for diversity. How then do we design for global third places? As Barber and Badre point out in their paper on culturability [36], localization is more about just translation. In this section, we look at ways in which Wikipedia varies according to language and cultural community and give case studies of specific examples.

From the beginning, Wikipedia founders wanted to allow global communities to develop independently. When asked if most articles were translations, founder Jim Wales said:

“by and large no. It's actually very important within the community. The English community is dominant and the people in the other languages get very upset if they are thought of as mere translations of the "real" encyclopedia in English. They are almost all completely independently written.”[37]

Even the name of Wikipedia is not the same in all languages—in Catalan Viquipèdia, in Asturian Uiquipedia, in Turkish Vikipedi. At one point the Spanish community voted on whether to change the name to Librepedia alluding to freedom (other suggestions included velocipedia, refering to speed and limonpedia, as a joke). In terms of visual design, the most obvious examples of global customization are the Hebrew and Arab Wikipedias. Since Hebrew and Arab are both read right to left, the navigation is a mirror image, a common convention with Hebrew and Arab web sites (Figure 4).

The Meta section of the wiki provides resources to aid in internationalization. For example, “Babel templates” give a framework for displaying the language ability of users on their user page. Users can self-rate their abilities in different languages on a scale from 1-3 with 1 being basic, 2 intermediate, and 3 fluent. [38]

Figure 4: Hebrew Wikipedia

Another section of the meta-wiki features a “translation of the week”. A stub or first paragraph of an article are nominated for translating into other languages. As mentioned earlier, there is also a Spanish translation of week:

“This is an article from the Spanish Wikipedia, ideally one which has reached feature article status (es:Wikipedia:Artículos destacados), but at least one which we believe to be solid, accurate, and not contain copyright issues, and which either has no article or a basic stub page on the English wikipedia.”[39]

Also on the meta-wiki, are embassies with “ambassadors” from different languages. This Wikimedia Embassy was set up as a central place for resources to help with cross-language issues such as site-wide policy and interlanguage linking. There is no requisite for becoming an ambassador other than interest and listed countries have 1-5 ambassadors, followed by a list for languages that need ambassadors [40].

The community portal has various degrees of importance in communication in Wikipedia. In English, this place is called the Village Pump, in French we have “le bistro”, in Catalan, “la taverna” (tavern), in Spanish, the café, all different images of real life traditional gathering places. Some are more readily accessible than others. There is nothing like the village pump on the Hindi Wikipedia or the Turkish Wikipedia.

Similarly, the discussion pages on the Hindi and Turkish Wikipedias are not used at all. Since almost all members are cross listed on the English Wikipedia they prefer to do their discussions on the discussion pages of the English Wikipedia.

Interesting Case Studies about Languages in Wikipedia[edit]

The Chinese Wikipedia has run into a variety of challenges. With respect to language, complaints have arisen because the site is a mix of traditional and simplified Chinese [41]. More significantly, Wikimedia sites have been blocked at least twice in its history by the People’s Republic of China. Last month this lead to another debate on whether there should be Wikinews (a sister project of Wikipedia, featuring news around the globe) in China given that news was more potentially contentious and this might lead to more blocking of Wikipedia. Arguments ensued about whether the English speakers of meta wiki should be making that decision for Chinese speakers and pointed out that Chinese speakers don’t just live in China. Turkish Vikipedi (the name suggested for Wikipedia in Turkish) acts mostly as a complement to the English version. Since most Turkish users are also fluent in English, they first go search an article in the English version. As a result of this, Turkish Vikipedi is specialized in cultural items such as regional geography, Turkish towns, famous/important Turkish people, Turkish history and suggested terminology for technical terms first appeared in another language.

Sourtimes [42] is a Turkish site which is similar in many aspects to Wikipedia. Yildirim says that it is extremely popular among the young population:

“[In Turkish] … eksi sozlugu herkes biliyor, Wikipedia’dan hicbirinin haberi yok. [In English] Everyone knows about sourtimes, none of them knows something like Wikipedia exists.”

The main reason is that Sourtimes is a humorous dictionary as opposed to Wikipedia. For example you can come across an article named: “sozluk yazarlarinin mezar tasi yazilari / the thumb stone lyrics of sourtimes users”. Its interface also uses comic type fonts, and humorous navigational keys such as “geri git ne bileyim / go back how should I now” which is displayed when an unavailable page is accessed. Also entries are only added to the current content, you cannot do over an article. The registration process is also lengthy and hard. Sourtimes accepts users only in specific time intervals of a year, this associates a sense of elegance to having an account. Even after you have an account, your entries are marked as “rookie” till you satisfy a council of elders with the quality of entries you have made. Inactive accounts are immediately deleted. Despite all these tackles, Sourtimes has an enourmous number of registered users and an steadily growing entry database. Orhan explains the reason:

“[In Turkish] Eksi sozluk bizim kahvehaneleri andiriyor, Vikipedi ise kutuphane gibi. Bizim yasimizdaki {universite-lise cagi} insanlarin nereye gitmek istedikleri malum, kahvehaneler {Turkiye’de} dolup tasiyor. [In English] Sourtimes is like a coffee house , Wikipedia is like a library. It is very clear {to me} which one do my age bracket {university students, high school students} prefers, coffeehouses {in Turkey} are always full of people.”

Turkish coffee houses are very common and not franchised, they all have distinct characteristics, but usually serve hot food, beverages, board games and billiards. You can come across 3-4 of them lined on a busy street. Orhan’s analysis reveals that Turkish people go for the same type of “Third Place”, the one they are used to in real world, when it comes to internet preference [35].

When the Wikipedia project first started, founder Jimbo Wales raised the possibility of banner ads in order to recover his initial investments. This prospect led to a fork of the very active Spanish Wikipedia in February 2002; the forked version is called "Enciclopedia Libre" [43] and is hosted at the University of Sevilla. The Enciclopedia Libre has 27.000 articles, which according to the Spanish Wikipedia are more likely to be local articles rather than a translation of the English version. There is a lot of copying back and forth of articles between the two encyclopedias and many people collaborate for both sites.

Users on the Indian Wikipedias are not as active as other languages, here are some reasons for this observation: English Wikipedia is somewhat matured. One can make incremental changes to the articles that have already been started upon. Whereas the Hindi Wikipedia is not yet matured. There are very few articles to write upon, inertia is larger. Lesser visibility means lesser readership. Community in English Wikipedia is properly moderated. People tend to know what the Wikipedia principles are whereas this is not the case for Hindi Wikipedia. Technology is one of the biggest reasons, editing in Unicode is a pain, users need to use external translators and then copy paste the contents to Wikipedia. Recently, there has been improvements in this technological tackle: user Shashi initiated a discussion for having an in built translator with another Wikipedia, user Rohan who then came up with a method for it, so now adding new content has become easier however editing existing stuff still is difficult. One last reason is the lack of janitorial services such as typo correction, copy editing, categorization etc. On the Hindi Wikipedia people mostly just add articles and create sub pages, and these janitorial services are omitted.

Aside from the Spanish web sites, we also spent some time on several other Wikipedias from countries where Spanish is spoken. On the Quechua (an official language of Peru and Bolivia, spoken by 14 million people) and Nahuatl (language of Aztecs in Mexico, spoken by 1.5 milion people) communities, we see sites started and maintained by non native speakers of these language. Because of this, the discussion and community languages are mostly in Spanish with a little English. The side navigation for smaller sites often stay in English. At the bottom, the Native American languages (Navajo, Cherokee, Inuktitut, Quechua, Nahuatl) are listed in an effort to collaborate [44].

The Nahuatl site has less than 100 articles and 50 registered users with only one administrator. One curious feature found was under community news in an article saying: “No al Walmart en Teotihuacan. La invasion de la tienda norteamericama Walmart a la zona de Teotihuacan no se debe permitir (No to Walmart in Teotihuacan. The invasion of the American store Walmart to the Teotihuacan zone should not be permitted).”

This refers to a recent controversy where many people were protesting the building of a store near the pyramids of Teotihuacan, one of the most important archaeological zones of Mexico that is open to the public. The writer goes on to advocate peaceful protest and a boycott. This wouldn’t be considered as NPOV, but in such a small community which is promoting cultural preservation, this message fits in more with that rather than NPOV.

Another interesting aspect of the Nahuatl site is the neologism section where Nahuatl members suggest Nahuatl terms for the various Wiki terms and technical categories. So we see a very grassroots effort to revitalize a language.

7. Making Wikipedia Better[edit]

In this section we address the questions of how the community would have been better designed and what we would change in the community. We believe that the way Wikipedia is thriving today is a miracle in itself. It is one of the very few online sites that solely consists of user built content. In spite of all this, we think that there is still some scope for improvement both in terms of content presentation , improvement from an HCI perspective in terms of better browsability and navigation and finally control on user edits.

Content Presentation[edit]

The mission of the Wikipedia project is to build the worlds largest free encyclopedia, the result of which being voluminous amount of information. The Wikipedia structure is such that when a new user visits the site he is overwhelmed by the amount of information that is present even on the main page. There is no specific way in which information will be presented to the user. This can get from the point of being interesting to being very distracting, for the many users who use Wikipedia just to get information for their academic purposes.

HCI Issues[edit]

Browsability principle for good usability is not well satisfied by Wikipedia. There is voluminous amount of information present on it. Sometimes it is very difficult when one is searching for information that is not directly related to articles that have been written on it, for example searching for Wikipedians from a particular community or looking only for Hindi Wikipedians.

Technological Issues[edit]

The technological issues involved with editing in Unicode have directly affected the membership and coverage of articles on The Regional Wikipedias based on the Indic script, like the Hindi Wikipedia.

In relation to this we got some more information from Suhas who mentioned that adding user content was easy but when users requested for some technical enhancements most often these were not possible.

Edit Control[edit]

Wikipedia, at present lets any person visiting the site edit content as well as contribute new articles, membership is not a prerequisite for this. Even if this might be one of the most important reasons for the success of the site we still feel that if users had to be members before they could edit articles, it would definitely reduce the number of vandalizing edits that happen on Wikipedia.

This boils down to the concept of anonymity. From our understanding of the interviews we are of the opinion that members tend to not trust articles or edits done by anonymous users which contradicts to the fact that most of them felt that not having to register to contribute articles itself was an awesome concept. We agree with the statement:

“Anonymity and being on the record both work, however they don’t work very well” [45]


References[edit]

Note: All links on this section are considered as of March 8th, 2005.