Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 7 • Issue: 7 • July 2017 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Wikipedia articles vs. concepts; Wikipedia usage in Europe

With contributions by: Thomas Niebler and Tilman Bayer

"Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia"[edit]

Reviewed by Thomas Niebler
This paper was also presented in the June 2017 WikiResearch showcase

In several Wikipedia-based systems and scientific analyses, researchers have assumed that no two articles in Wikipedia represent the same concept, i.e. a semantically closed description of a specific item, for example "New York City". Lin et al. however published a paper at CSCW'17[1] where they showed that this “article-as-concept” assumption does in fact not hold: The abovementioned article about "New York City" has a separate sub-article about the "History of New York City", which describes a topic very closely related to “New York City” and could at the same time easily be merged into the original article. This way of splitting up lengthy articles into several smaller ones ("summary style", more specifically "article size") may improve readability for human users, but seriously impairs many studies based on the “article-as-concept” assumption. Using a simple classification approach on features based on both the link structure as well as semantic aspects of the title and the context, the authors identified 70.8% of the top 1000 visited pages which have been split up into articles and sub-articles, with an average of 7.5 sub-articles per article, thus stating that the existence of sub-articles is not the exception, but the rule.

A drawback with the proposed sub-article relationship detection method, as stated in the paper, is that it is trained only on explicitly encoded sub-article relationships; it is yet unsure how to detect implicit relationships, i.e. where no editor has linked the sub-article with the main article. Still, this presents the first step into a deeper analysis of the Wikipedia page network to make it at the same time better readable for humans, but also easily exploitable for many algorithms.


85% of German scientists use Wikipedia, and other European media survey results[edit]

Summary by Tilman Bayer

A survey among 1,354 German academic researchers about their professional use of social media found Wikipedia to be the most widely used site as of 2015, with 84.7%.[2] Among German internet users in general, 79% use Wikipedia. Only 2% of these Wikipedia readers think it's "never reliable" and 80% hold it is "mostly" ("größtenteils") reliable.[3] A report by the German Monopolkommission (which advises the government on antitrust matters) on potential monopoly problems in the Internet search engine market highlighted Wikipedia as the top 10 website in Germany that is by far the most dependent on Google, with around 80% of its traffic (according to third-party data from SimilarWeb that is not quite consistent with the Wikimedia Foundation's own data).[4]

In France, surveys by the Institut national de la statistique et des études économiques (INSEE) found that from 2011 to 2013, the ratio of people who use the internet to consult Wikipedia ("or any other collaborative online encylopedia") rose from 39% to 51%. Wikipedia usage was higher among younger internet users and among those with degrees - 82% among 16-24 year olds, 54% among 25-54 year olds, and only 31% among 55-74 year olds.[5] The corresponding Eurostat data gave 45% for the entire European Union as of 2015.[6]

In contrast, Ofcom found that only 2-4% of UK 12-15 year olds use Wikipedia as first stop for information as of 2015.[7]

In the meantime, a 2016 Knight Foundation report, based on a study by Nielsen, found that "Among mobile sites [in the US], Wikipedia reigns in terms of popularity (the app does well too) and amount of time users spend on the entity. Wikipedia’s site reaches almost one-third of the total mobile population each month".[8]

Conferences and events

See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.

Other recent publications[edit]

Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.

Compiled by Tilman Bayer
  • "Intellectual interchanges in the history of the massive online open-editing encyclopedia, Wikipedia"[9] From the abstract: "[Its] open-editing nature may give us prejudice that Wikipedia is an unstable and unreliable source; yet many studies suggest that Wikipedia is even more accurate and self-consistent than traditional encyclopedias. Scholars have attempted to understand such extraordinary credibility, but usually used the number of edits as the unit of time, without consideration of real time. In this work, we probe the formation of such collective intelligence through a systematic analysis using the entire history of English Wikipedia articles, between 2001 and 2014. ... [We] find the existence of distinct growth patterns that are unobserved by utilizing the number of edits as the unit of time. To account for these results, we present a mechanistic model that adopts the article editing dynamics based on both editor-editor and editor-article interactions.. .. [The] model indicates that infrequently referred articles tend to grow faster than frequently referred ones, and articles attracting a high motivation to edit counterintuitively reduce the number of participants. We suggest that this decay of participants eventually brings inequality among the editors, which will become more severe with time."
This paper was also presented in the February 2017 Wikimedia Research showcase
  • "Not at Home on the Range: Peer Production and the Urban/Rural Divide"[10] From the abstract and paper: "We find that in both Wikipedia and OpenStreetMap, peer-produced content about rural areas is of systematically lower quality, is less likely to have been produced by contributors who focus on the local area, and is more likely to have been generated by automated software agents (i.e. 'bots')", however there is a "substantial rural advantage in the per capita quantity of peer-produced information."
  • "Understanding the Role of Participative Web within Collaborative Culture: The Case of Wikipedia"[11] From the abstract: "This article will use Wikipedia as an example to illustrate about what the term “participative webs” exactly means. From perspectives of collaborative culture, this study will emphasize the role that participative website plays in knowledge-creating and knowledge-sharing [...] and discuss how collaborative culture reflects the role participative web is equipped."
  • "From Freebase to Wikidata: The Great Migration"[12] From the abstract: "The two major collaborative knowledge bases are Wikimedia's Wikidata and Google's Freebase. Due to the success of Wikidata, Google decided in 2014 to offer the content of Freebase to the Wikidata community. In this paper, we report on the ongoing transfer efforts and data mapping challenges, and provide an analysis of the effort so far. [...] Throughout the migration, we have gained deep insights into both Wikidata and Freebase, and share and discuss detailed statistics on both knowledge bases."


