Research:Non-bot interlanguage linking

From Meta, a Wikimedia project coordination wiki
This page documents a completed research project.

Key Personnel[edit]

  • Claudia Koltzenburg

Project Summary[edit]

This research project aims to study the role of English in the Wikipedia universe by looking into debates on specific interlanguage links that were first set by human users.


The chosen method is data selection and analysis on the basis of data that are publicly available.

There seem to be several technical approaches (see References) and I am not in a position to decide which one(s) would be useful at which stage. I am requesting WMF support for creating one or several datasets that I might be able to base my study on.

I seek data on editorial debates in any language that concern language links. Searching edit histories seems vital. As far as I can guess at this stage, within the WMF server universe debates about specific language links may occur on article talk or user talk pages, even AfD pages or other pages in this kind of page group could be of interest.

While articles on persons, institutions, species, films etc. or on events and dates are most likely not interesting (since finding an interlanguage agreement is probably easy), entries on abstract terms like de.Effektivwert in its apparent misalliance with, e.g., en.Root mean square or en.Marginalisation that carries a language link to de.Exklusion could yield very interesting data indeed.


My results will be published open access with the most suitable location to be specified.

Wikimedia Policies, Ethics, and Human Subjects Protection[edit]


Benefits for the Wikimedia community[edit]

The results of my study will provide new insights into the role of English in the multilingual Wikipedia universe.


November 2012: Select example debates -- January 2013: Finish analysis of example debates -- February 2013: Finish writing up the results, conclusions, discussion -- March 2013: Run an open peer review process before handing it in for PhD review



Chew, Phyllis Ghim-Lian (2009) Emergent lingua francas and world orders : the politics and place of English as a world language. New York, N.Y : Routledge, 2009, 978-0-415-87227-0

Ferschke, Oliver; Daxenberger, Johannes; Gurevych, Iryna (2012) A Survey of NLP Methods and Resources for Analyzing the Collaborative Writing Process in Wikipedia. In: Iryna Gurevych and Jungi Kim: The People’s Web Meets NLP: Collaboratively Constructed Language Resources, p. (to appear), Springer, preprint

Ferschke, Oliver; Daxenberger, Johannes; Gurevych, Iryna (2012) Wikipedia-based Corpora for Analyzing Revisions, Discussions and Text Quality in Collaborative Writing. Workshop on Automatic Processing of Non-Standard Data Sources in Corpus-Based Research (Extended Abstract), August 2012. Cologne, Germany.

Ferschke, Oliver; Gurevych, Iryna; Chebotar, Yevgen (2012) Behind the Article: Recognizing Dialog Acts in Wikipedia Talk Pages. In: Proceedings of the 13th Conference of the European Chapter of the ACL (EACL 2012), p. 777-786, April 2012. Avignon, France.

Ferschke, Oliver; Zesch, Torsten; Gurevych, Iryna (2011) Wikipedia Revision Toolkit: Efficiently Accessing Wikipedia's Edit History. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. System Demonstrations, p. 97-102, June 2011. Portland, OR, USA.

Koltzenburg, Claudia (2012) (my first inquiry for technical support from the WP community) [Pywikipedia-l] how search for non-bot interlanguage link edits.

Vrandečić, Denny (2012) Ratio of language links to full text in Wikipedias. June 23rd 2012, httpz://

Yasseri, Taha; Sumi Robert; Rung András; Kornai András; Kertész János (2012) Dynamics of Conflicts in Wikipedia. PLoS ONE 7(6): e38869. doi:10.1371/journal.pone.0038869

External links[edit]


C.Koltzenburg (talk)