Jump to content


From Meta, a Wikimedia project coordination wiki
Wikimedia Research Newsletter

Vol: 7 • Issue: 6 • June 2017 [contribute] [archives]

Discussion summarization; Twitter bots tracking government edits; extracting trivia from Wikipedia

With contributions by: Baha Mansurov and Tilman Bayer


"Wikum: bridging discussion forums and wikis using recursive summarization"[edit]

Summary by Baha Mansurov

The paper[1] proposes a solution to the problem of information galore in online discussions by creating and testing a tool that allows editors to summarize parts of a discussion and combine these summaries into a higher level summaries until a single summary of the discussion is created. (see also the related presentation at the September 2016 Wikimedia Research Showcase)

Annual "State of Wikimedia Research" summary presentation at Wikimania[edit]

The Wikimania 2017 conference in Montreal, Canada featured the "State of Wikimedia Research 2016–2017" presentation, a quick tour of scholarship and academic research on Wikipedia and other Wikimedia projects from the last year (now an annual Wikimania tradition, dating back to 2009). The slides are available online. The highlighted research publications (many previously covered in this newsletter) were grouped into the following topic areas: "Gender gap in participation", "Gender gap in content", "Fake news!", "Using Wikipedia for prediction", "Syndication", "Wikipedia and the world", and "Datasets: research that enables other research".

Conferences and events[edit]

See the research events page on Meta for upcoming conferences and events, including submission deadlines.

Other recent publications[edit]

Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.

Compiled by Tilman Bayer
  • "Beyond neutrality: how zero rating can (sometimes) advance user choice, innovation, and democratic participation"[2] From the abstract: "Over four billion people across the globe cannot afford Internet access. [...] Enter zero rating. Mobile Internet providers in the developing world now waive the data charges for services like Facebook, Wikipedia, or local job-search sites. Despite zero rating's apparent benefits, many advocates seek to ban the practice as a violation of net neutrality.
    This Article argues that zero rating is defensible by net neutrality's own normative lights. Network neutrality is not about neutrality for its own sake, but about advancing consumer choice and welfare, innovation in the development of new services, and democratic participation in the public sphere. Analysis of zero rating should accordingly focus on the question of how it impacts these goals: we ought to embrace zero-rating programs that advance net neutrality’s substantive goals and reserve our skepticism for those services that would sacrifice the network’s generative potential to pursue mere short-term gains. " (About Wikipedia Zero)
  • "Fun facts: automatic trivia fact extraction from Wikipedia"[3] From the abstract: "we formalize a notion of trivia-worthiness and propose an algorithm that automatically mines trivia facts from Wikipedia. We take advantage of Wikipedia’s category structure, and rank an entity’s categories by their trivia-quality. Our algorithm is capable of finding interesting facts, such as Obama’s Grammy or Elvis’ stint as a tank gunner. In user studies, our algorithm captures the intuitive notion of 'good trivia' 45% higher than prior work. Search-page tests show a 22% decrease in bounce rates and a 12% increase in dwell time, proving our facts hold users’ attention."
  • "The citizen IS the journalist: automatically extracting news from the swarm"[4] From the abstract: "... we describe SwarmPulse, a system that extracts news by combing through Wikipedia and Twitter to extract newsworthy items. We measured the accuracy of SwarmPulse comparing it against the Reuters and CNN RSS feeds and the Google News feed. We found precision of 83 % and recall of 15 % against these sources."
  • "Production of scientific information on the internet: the example of Wikipedia" ("Produktion von naturwissenschaftlichen Informationen im Internet am Beispiel von Wikipedia", in German)[5] From the English abstract: "On the internet, lay people cannot only passively receive scientific information, they can also actively produce it. How do lay people process uncertain and contradictory information? [...] little is yet known about the factors that influence the production of natural science information by lay people on the Internet. In our article, we discuss a variety of influencing factors and derive predictions about how these factors affect the production behaviors and the resulting text products. Finally, we illustrate our considerations using the online encyclopaedia Wikipedia."
  • "Building an encyclopedia with a wiki? Looking back at Wikipedia's editorial policy" ("Construire une encyclopédie avec un wiki ? Regards rétrospectifs sur la politique éditoriale de Wikipédia", in French)[6] From the English abstract: "[The author] studied the discussions on applying rules to source citation and identified two streams that illustrate the editorial policy known as 'wiki pole' and 'encyclopedia pole'. Although these two epistomological regimes may appear mutually contradictory, in fact this policy aims at finding balance between the wiki's potential and the requirements of trustworthiness inherent in producing an encyclopedia."
  • "Persistent Bias on Wikipedia. Methods and Responses"[7] From the abstract: "Techniques for biasing an entry include deleting positive material, adding negative material, using a one-sided selection of sources, and exaggerating the significance of particular topics. To maintain bias in an entry in the face of resistance, key techniques are reverting edits, selectively invoking Wikipedia rules, and overruling resistant editors. Options for dealing with sustained biased editing include making complaints, mobilizing counterediting, and exposing the bias. To illustrate these techniques and responses, the rewriting of my own Wikipedia entry serves as a case study." (about the article Brian Martin)
  • "Multi-cultural Wikipedia mining of geopolitics interactions leveraging reduced Google matrix analysis"[8] From the abstract: "Wikipedia stores valuable fine-grained dependencies among countries by linking webpages together for diverse types of interactions (not only related to economical, political or historical facts). We mine herein the Wikipedia networks of several language editions using the recently proposed method of reduced Google matrix analysis. [...] Our study concentrates on 40 major countries chosen worldwide. Our aim is to offer a multicultural perspective on their interactions by comparing networks extracted from five different Wikipedia language editions, emphasizing English, Russian and Arabic ones. We demonstrate that this approach allows to recover meaningful direct and hidden links among the 40 countries of interest." (See also earlier coverage of related papers by some of the same authors: 'Wikipedia communities' as eigenvectors of its Google matrix" , "How Wikipedia's Google matrix differs for politicians and artists")
  • "Enriching Wikidata with frame semantics"[9] From the paper: "To increase the usability of WD [Wikidata] for NLP tasks, we aim at enriching WD with linguistic information by aligning it to the famous lexicon FrameNet ... Specifically, we aim to find a mapping between WD facts, e.g. educated at(Person, University) and similar structures in expert lexical resources. [...] in addition to the direct result of enriching WD with linguistic information, the alignments can be used to refine the property structure of WD by inducing new general/specific properties. For instance, the property killed by refers to someone (victim) killed by somebody else (killer). However, the property does not distinguish between different kinds of killing, such as execution. In FN such information is already captured through the frames Execution and Killing, where the former frame inherits from the latter. By aligning killed by to both frames, the property killed by can refined by introducing a new sub-property: executed by."
  • "Explicit neutrality in voter networks – an analysis of the requests for adminship (RfAs) in Wikipedia" ("Explizite Neutralität in Wählernetzwerken – Eine Analyse der Requests for Adminship (RfAs) in Wikipedia", in German)[10] Translated from the abstract: "This paper examines requests for adminship (RfAs) in Wikipedia. In particular, we are answering the research question about what increases the probability that someone provides a neutral vote about a potential administrator. ... The results indicate a strong tendency toward neutral reciprocity (i.e. a higher probability that user A votes neutral on user B who himself had voted neutral on user A) and neutral balance (i.e. a higher probability that user A votes neutral on another user B, who has received an opposing vote from user C, who in turn had received an opposing vote from user A)."
  • "Keeping Ottawa honest—one tweet at a time? Politicians, journalists, Wikipedians and their Twitter bots"[11] From the abstract: "WikiEdits bots are a class of Twitter bot that announce edits made by Wikipedia users editing under government IP addresses, with the goal of making government editing activities more transparent. This article examines the characteristics and impact of transparency bots, bots that make visible the edits of institutionally affiliated individuals by reporting them on Twitter. We map WikiEdits bots and their relationships with other actors, analyzing the ways in which bot creators and journalists frame governments’ participation in Wikipedia. We find that, rather than providing a neutral representation of government activity on Wikipedia, WikiEdits bots and the attendant discourses of the journalists that reflect the work of such bots construct a partial vision of government contributions to Wikipedia as negative by default."


  1. Zhang, Amy X.; Verou, Lea; Karger, David (2017). Wikum: bridging discussion forums and wikis using recursive summarization. CSCW '17. New York, NY, USA: ACM. pp. 2082–2096. ISBN 9781450343350. doi:10.1145/2998181.2998235.  Closed access
  2. Ard, BJ (May 1, 2016). "Beyond neutrality: how zero rating can (sometimes) advance user choice, innovation, and democratic participation". Maryland Law Review 75 (4): 984. ISSN 0025-4282. 
  3. Tsurel, David; Pelleg, Dan; Guy, Ido; Shahaf, Dafna (December 12, 2016). "Fun facts: automatic trivia fact extraction from Wikipedia". arXiv:1612.03896 [cs].  (preprint), published version: https://dl.acm.org/citation.cfm?id=3018709 Closed access, author's copy: http://www.pelleg.org/shared/hp/download/fun-facts-wsdm.pdf
  4. Oliveira, João Marcos de; Gloor, Peter A. (2016). "The citizen IS the journalist: automatically extracting news from the swarm". In Matthäus P. Zylka, Hauke Fuehres, Andrea Fronzetti Colladon, Peter A. Gloor (eds.). Designing Networks for Innovation and Improvisation. Springer Proceedings in Complexity. Springer International Publishing. pp. 141–150. ISBN 9783319426969.  Closed access
  5. Nestler, Steffen; Leckelt, Marius; Back, Mitja D.; Beck, Ina von der; Cress, Ulrike; Oeberst, Aileen (July 1, 2017). "Produktion von naturwissenschaftlichen Informationen im Internet am Beispiel von Wikipedia". Psychologische Rundschau 68 (3): 172–176. ISSN 0033-3042. doi:10.1026/0033-3042/a000360. Retrieved 2017-07-29.  Closed access
  6. Sahut, Gilles (January 6, 2017). "Construire une encyclopédie avec un wiki ? Regards rétrospectifs sur la politique éditoriale de Wikipédia". I2D – Information, données & documents. me 53 (4): 68–77. ISSN 0012-4508.  Closed access
  7. Martin, Brian (2017). "Persistent Bias on Wikipedia. Methods and Responses". Social Science Computer Review.  Closed access Author's copy
  8. Frahm, Klaus M.; Zant, Samer El; Jaffrès-Runser, Katia; Shepelyansky, Dima L. (December 23, 2016). "Multi-cultural Wikipedia mining of geopolitics interactions leveraging reduced Google matrix analysis". arXiv:1612.07920 [nlin, physics:physics].  (preprint), published version: http://www.sciencedirect.com/science/article/pii/S0375960116321879 Closed access
  9. Mousselly-Sergieh, Hatem; Gurevych, Iryna (2016). "Enriching Wikidata with frame semantics". Semantic Scholar. 
  10. Putzke, Johannes; Takeda, Hideaki (January 23, 2017). "Explizite Neutralität in Wählernetzwerken – Eine Analyse der Requests for Adminship (RfAs) in Wikipedia". Wirtschaftsinformatik 2017 Proceedings. Closed access
  11. Ford, Heather; Dubois, Elizabeth; Puschmann, Cornelius (October 12, 2016). "Keeping Ottawa honest—one tweet at a time? Politicians, journalists, Wikipedians and their Twitter bots". International Journal of Communication 10 (0): 24. ISSN 1932-8036. 

Wikimedia Research Newsletter
Vol: 7 • Issue: 6 • June 2017
About • Subscribe: Email WikiResearch on Twitter WikiResearch on Facebook WikiResearch on mastodon.social[archives][Signpost edition][contribute][research index]