Wikipedia Diversity Observatory/Papers

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

This page presents the concept Cultural Contextualization in Wikipedia and a selection of papers that delve into the topic from different perspectives.

In general, differences in the content of language editions are attributed by the current literature to contextual factors or to a process named by Hecht (2013, p. 47) as cultural contextualization, which “is the cause of some of the content diversity in multilingual Wikipedia”.

Cultural Contextualization[edit]

Cultural contextualization is also present in other user-generated projects such as OpenStreet Maps, Twitter or Flickr (Hecht 2013). The explanation of how it influences the final characteristics of content is rooted in the fields of Linguistics, and Cultural and Social Psychology. For instance, according to Clark (1996), the members of a cultural community usually share “facts, beliefs, procedures, norms, and assumptions”. Hence, it is likely that the editors of each language community (and subcommunities, especially considering those languages with large geographical extension) may reflect in their articles the meanings they implicitly agree on, resulting in a great deal of diversity in such a worldwide project. Cultural contextualization occurs when there is a certain degree of freedom in content-based projects.

In Wikipedia, there is extensive literature on how cultural contextualization has shaped each language edition. Depending on whether the emphasis is put on the articles’ text or on the Wikipedia’s overall structure, effects can be classified into two main groups: Discourse and Structure.

Discourse effects are based on the idea that since each language edition constitutes a community (and perhaps few subcommunities), their editors tend to hold a shared cultural background and this ultimately limits the points of view adopted in the articles within one and the same language edition. (In the literature, the editor’s point of view is referred to as: ‘linguistic point of view’, ‘national point of view’, or ‘cultural bias’). In different language editions, the differences in the editors’ point of view become more prominent, especially when it comes to controversial topics, where history and politics are seen from opposite positions (Massa and Scrinzi 2011; Apic, Betts, and Russell 2011). For instance, Rogers and Sendijarevic (2012) compared an article dedicated to ‘The Srebrenica Massacre’ throughout different Wikipedia language editions, including English and Balkan languages. The study shows how the same article in different language editions adopts a different point of view to illustrate facts; such points of view are sometimes unified, other times in total disagreement when it comes to the terminology employed and its political connotations.

Likewise, in order to explore how contextualized Wikipedia language editions are, Bao et al. (2012) developed a website which allows to explore similarities and differences in points of view of an article whose concept exists across languages. Pentzold et al. (2017) showed that topics related to cultural heritage such as ‘Bullfighting’ are framed differently in Catalan, Spanish and English language editions, and have different focuses of controversies. Other studies point out that editors’ geographical closeness to the subject of their articles impacts on the level of article exhaustiveness. Callahan and Herring (2011) explored in the English and Polish Wikipedia how the biographical articles of well-known people are more complete (in terms of features such as the number of pictures, education, political ideology, controversies mentioned or family members names) in the language editions associated to the territories where the person is from.

Structural effects are based on the idea that context and culture are relevant factors that affect editor interests and consequently content coverage. Ronen et al. (2014) explored the relationships between Wikipedia language editions by creating a network with all languages (global language network) articles’ edits and assessed their centrality with eigenvector centrality. They found that English acts as an influential central hub, followed by other well-spread languages such as French, Spanish, German, among others. However, besides attributing it to visibility, they do not explain the factors which influence each other. In this sense, Saimolenko et al. (2016), in order to explore to understand cultural similarity understood as the significant interest of communities in contributing to articles about similar topics, analyzed both edits in articles existing in various language editions and several cultural factors. They found that cultural similarity is due to various factors affecting topic choices such as shared language family, number of bilinguals, geographical proximity, among others.

In another study on common editing interests, Karimi et al. (2015) gathered all the editors’ edits from English Wikipedia and analyzed their relationships in order to determine how close their affinities were. Results showed that editors from close locations tend to have a higher coincidence in the articles they edit than editors from distant geographical locations. The geographical factor was also used to explain that Wikipedia language editions whose language-related territories are far from each other tend to have less articles in common (i.e. their articles have no equivalence) than those whose territories that are geographically close (Warncke-Wang et al. 2012).

Other studies show that editors tend to focus on their territories, either because geolocated articles are edited by nearby editors or because they give them a higher visibility in the overall Wikipedia network of articles. For instance, Hecht and Gergle (2010a) computed the location of each anonymous edit in geolocated articles and discovered that many of the contributions were made from close distances. Another effect detected by Hecht and Gergle (2009), called ‘Self-focus bias’, explains that the articles located in the countries local to each language edition are linked to many more articles (i.e. they have more inlinks) than the articles located in the other countries. All in all, this second group of effects shows that context has a key impact on Wikipedia content coverage and shows the relevance of geographical context to editors’ activity.

In WCDO project we argue that in order to estimate the impact of cultural context on content coverage, it is necessary to know which articles relate to the cultural context of each language edition besides geolocated articles, including topics such as language, people, traditions, among others. This is the Cultural Context Content (CCC).

Selected Papers[edit]

Cultural Context Content[edit]

  • Miquel-Ribé, M. (2019). The Sum of Human Knowledge? Not in One Wikipedia Language Edition. Wikipedia@ 20.
  • Miquel-Ribé, M., & Laniado, D. (2019, July). Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 13, pp. 620-629).
  • Miquel-Ribé, M., & Laniado, D. (2018). Wikipedia Culture Gap: Quantifying Content Imbalances Across 40 Language Editions. Frontiers in Physics.
  • Miquel Ribé, M. (2017) Identity-based motivation in digital engagement: the influence of community and cultural identity on participation in Wikipedia (Doctoral dissertation, Universitat Pompeu Fabra).
  • Miquel-Ribé, M., & Laniado, D. (2016, July). Cultural identities in wikipedias. In Proceedings of the 7th 2016 International Conference on Social Media & Society (p. 24). ACM.
  • Miquel-Ribé, M., & Rodríguez, H. (2011). Cultural configuration of Wikipedia: measuring Autoreferentiality in different languages. In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011 (pp. 316-322).

Cultural Contextualization Discourse effects[edit]

  • Apic, G., Betts, M. J., & Russell, R. B. (2011). Content disputes in Wikipedia reflect geopolitical instability. PloS One, 6(6). doi:10.1371/journal.pone.0020902.g001
  • Aragón, P., Laniado, D., Kaltenbrunner, A., & Volkovich, Y. (2012). Biographical social networks on Wikipedia: a cross-cultural study of links that made history. (p. 19). WikiSym '09: Proceedings of the 5th International Symposium on Wikis and Open Collaboration.
  • Massa, P., & Scrinzi, F. (2011). Exploring linguistic points of view of Wikipedia. (pp. 213–214). WikiSym '09: Proceedings of the 5th International Symposium on Wikis and Open Collaboration. ACM.
  • Pentzold, C., Weltevrede, E., Mauri, M., Laniado, D., Kaltenbrunner, A., & Borra, E. (2017). Digging Wikipedia. Journal on Computing and Cultural Heritage, 10(1), 1–19. doi:10.1145/3012285 Rogers, R., & Sendijarevic, E. (2012). Neutral or National Point of View? A Comparison of Srebrenica articles across Wikipedia's language versions. Proceedings of the Wikipedia Academy Conference 2012, Berlin.

Cultural Contextualization Structural effects[edit]


  • Hecht, B. J. (2013). The Mining and Application of Diverse Cultural Perspectives in User-Generated Content. Doctoral Dissertation. Northwestern University. United States.
  • Hecht, B. J., & Gergle, D. (2010a). On the localness of user-generated content (pp. 229– 232). CSCW '10: Proceedings of the 2010 Conference on Computer Supported Cooperative Work. ACM.
  • Hecht, B., & Gergle, D. (2009). Measuring self-focus bias in community-maintained knowledge repositories (pp. 11–20). C&T '09: Proceedings of the Fourth International Conference on Communities and Technologies.
  • Hecht, B., & Gergle, D. (2010b). The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context (pp. 291–300). CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM.
  • Karimi, F., Bohlin, L., Samoilenko, A., Rosvall, M., & Lancichinetti, A. (2015). Quantifying national information interests using the activity of Wikipedia editors. ArXiv Abs/1312.0976, 1503, 5522.
  • Ronen, S., Gonçalves, B., Hu, K. Z., Vespignani, A., Pinker, S., & Hidalgo, C. A. (2014). Links that speak: The global language network and its association with global fame. Proceedings of the National Academy of Sciences, 111(52), E5616-E5622.
  • Samoilenko, A., Karimi, F., Edler, D., Kunegis, J., & Strohmaier, M. (2016). Linguistic neighbourhoods: explaining cultural borders on Wikipedia through multilingual co-editing activity. EPJ data science, 5(1), 9.

Language Gap[edit]

  • Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M., & Gergle, D. (2012, May). Omnipedia: bridging the wikipedia language gap. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1075-1084). ACM.
  • Warncke-Wang, M., Uduwage, A., Dong, Z., & Riedl, J. (2012). In search of the ur-Wikipedia: universality, similarity, and translation in the Wikipedia inter-language link network. OpenSym '12: Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration, 20.
  • Wulczyn, E., West, R., Zia, L., & Leskovec, J. (2016). Growing Wikipedia Across Languages via Recommendation (pp. 975–985). WWW '16: Proceedings of the 25th International Conference on World Wide Web. ACM.

Other topics[edit]

Multilingualism activity[edit]

  • Hale, S. A. (2014). Multilinguals and Wikipedia editing (pp. 99–108). WS '14: Proceedings of the 2014 ACM conference on Web science.
  • Kim, S., Park, S., Hale, S. A., Kim, S., Byun, J., & Oh, A. H. (2016). Understanding Editing Behaviors in Multilingual Wikipedia. PloS One, 11(5), e0155305.

Language Diversity[edit]

  • Van Dijk, Z. (2009). Wikipedia and lesser-resourced languages. Language Problems & Language Planning, (3), 33. doi:10.1075/lplp.33.3.03van

Cultural differences in behaviour[edit]

  • Pfeil, U., Zaphiris, P., & Ang, C. S. (2006). Cultural Differences in Collaborative Authoring of Wikipedia. Journal Computer-Mediated Communication, 12(1), 88–113.