Jump to content

Research:Wikimedia Summer of Research 2011

From Meta, a Wikimedia project coordination wiki
Diederik van Liere
Maryana Pinchuk
Steven Walling
This page documents a completed research project.

From June through August 2011, the Wikimedia Foundation Summer of Research (WSoR) brought a group of researchers to study long-term participation trends in Wikipedia using a multidisciplinary approach.

The research team


Led by Diederik van Liere, Maryana Pinchuk, and Steven Walling, the Foundation had the pleasure of having the following researchers visit us for summer 2011:

  • R. Stuart Geiger is a PhD candidate, UC Berkeley School of Information, focusing on knowledge production in distributed and decentralized environments -- specifically Wikipedia and scientific research networks. He has been a Wikipedia editor since 2004 has been studying the project as an ethnographer since 2007. His current research explores the relationship between technical infrastructures and social structures, and he has written on bots, vandal fighting, administration, and the history of Wikipedia.
  • Aaron Halfaker is a PhD candidate of Computer Science at the University of Minnesota, GroupLens Research, focusing on Computer-mediated human interaction. Aaron started editing Wikipedia four years ago and quickly found his niche creating user scripts to find ways of improving the collaborative experience. His research explores mechanisms for motivating and supporting volunteer collaboration.
  • Fabian Kaelin is a Master of Science candidate from McGill University, focused on machine learning.
  • Melanie Kill is Assistant Professor of English at the University of Maryland, specializing in digital rhetoric and genre studies. She is currently at work on a book on Wikipedia and the history of the genre of the encyclopedia. She earned her PhD in Rhetoric and Language Studies from University of Washington and previously has taught at Texas Christian University.
  • Giovanni Luca Ciampaglia is a PhD candidate at the University of Lugano in Switzerland. Giovanni is a computer scientist who studies user involvement in commons-based peer production communities, group consensus and collective deliberation processes.
  • Yusuke Matsubara is a PhD student, University of Tokyo (Japan), studying computational linguistics. His research focus is in analysing how people write and read from a computational and empirical point of view. Since 2008, he has been an occasional writer, translator and programmer for Wikimedia.
  • Jonathan Morgan is a PhD candidate, University of Washington, studying social interaction on collaborative online creative environments. As a researcher, he is particularly interested in tracing connections between the things people say (and the way they say them) and their roles, goals and activities online. He also works on the design of tools for improving public deliberation on the web, and on practical tools for internet researchers.
  • Shawn Walker is a PhD candidate at the University of Washington iSchool, and studies digital government and public engagement.

Research questions


In light of the results of the Editor Trends Study and the Board's resolution on openness, the team used week-long group sprints to answer detailed questions related to participation. The following lists of questions helped guide our inquiry, though a precise list of the topics covered are available below.



Please see our Summary of Findings.

List of research project pages


May be completed, still in progress, or simply avenues of inquiry that have been ended for technical reasons or reoriented priorities. If you have questions please ask on the relevant Talk page.

Quality of PPI editor work -- Drdee This research project compared editors "recruited" via the Public Policy Initiative to other editors with similar edit counts. It concluded that the Wikipedians we recruit this way are just as good as the editors we get in other ways.

Newbie reverts and article length -- EpochFail Newbies are editing more complete encyclopedia articles than they used to, and that edits to more complete articles have always been more likely to have been reverted.

Newbie teaching strategy trends -- Staeiou, Drkill, Jtmorgan Wikipedian teaching strategies are shifting in two significant ways:

  • a significant drop in messages including praise and thanks corresponded with an increase in the overlap of teaching with criticism
  • a decline in personalized teaching corresponded with an increase in templated instruction

Patroller work load -- EpochFail The number of new pages that human editors patrol has been going down since 2007. This suggests that the workload of new page patrollers has also been decreasing.

Alternative lifecycles of new users -- Staeiou, Jtmorgan New users are receiving substantially more notifications that their articles and images are being deleted, but are participating substantially less in community processes, across almost all areas of activity.

Ignored period and retention -- Whym Some earlier interactions can have negative impact on retention of new editors. On the contrary to a speculation that early messages motivate new editors to contribute, retained editors are found to have shorter ignored period than leaving editors do after 2006.

Newbie reverts and subsequent editing behavior -- Swalker Editor retention has been decreasing over time. The negative effect of a revert has increased over time

Deletion notifications to new users -- Staeiou There's a significant decline in the number of new users whose first message was a welcome and a rise in those whose first message was a warning. Receiving a deletion notification as a first message does not appear to predict whether or not a new editor will be retained 2-6 months later, but further study is recommended to compare retention metrics for new article creators who did and did not receive deletion notices.

Classifying wikilove messages -- Jtmorgan This project involves categorizing a large set of Wikilove messages in order to get a better idea of how the community is using this new tool, and using that dataset in order to train an active learning classifier to automatically detect the sentiment of Wikilove messages in the future.

Anonymous edits -- declerambaul IP editing is declining faster than edting by logged in users. But in June 2011 it still accounts for a fifth of the edits on EN wiki

Rhetoric of the welcome message -- Drkill This sprint asks what these messages have said and currently say, or don't say, to new editors about 1) Wikipedia and its larger mission, 2) the Wikipedian community, 3) the types of participation new editors are welcomed into.

Sentiment analysis tool of new editor interaction -- Whym This sprint represents the construction of a fundamental tool to be used to answer further research questions, a sentiment analysis classification algorithm. Ultimately the classifier was not accurate enough to be useful, but future work is planned to improve it.

The Speed of Speedy Deletions -- Staeiou Speedy deletion is usually very fast with a large proportion of speedy deletion tagging taking place in the moments of creation, usually followed by deletion in an average of half an hour.

New user help requests -- Jtmorgan Fewer than 10% ask for help during their first 30 days. Of those that did, less than half received a response from a real person during that period. The places they asked for help were all over the map, but the most common place was their own talk page or someone else's. Some of the 'other' places they asked for help include: the Reference Desk (for both reference and traditional 'help' topics), article talk pages and edit summaries. More than half of those who asked for help received some sort of welcome on their user talk page with links to help resources. Very few of these users used the {{helpme}} template, even though many of them received Welcome templates that included 'Helpme' instructions* Full research report on this and related sprints here.

New User Participation in Help Spaces -- Jtmorgan, Swalker Based on analysis of a small sample of Newbie comments, Newbies aren't good at knowing where to ask for help, and Wikipedia isn't good at spotting requests for help, particularly when newbies talk on their own talkpage. Full research report on this and related sprints here.

Software for quick processing of Wikidumps -- EpochFail A python library is built and tested for processing XML dump files quickly. The January 1st, 2011 full history dump with text was processed in 20 hours.

New editor welcome wishlist -- Drkill Describes features new editors might find most useful in welcome message templates.

File:Vandal revert 50 prop.by month.png

Vandal fighter work load -- EpochFail There is a steeper decline in the number of vandal fighters than all editors, with the steepest decline amongst less active vandal fighters. The number of vandal reverts completed by individual fighters also appears to be declining, suggesting that the overall workload of vandal fighters is decreasing.

First edit session -- EpochFail For newbies, the amount of their edits that are reverted or deleted is a powerful predictor of retention. Their initial investment is also a powerful predictor of retention. New editors show less initial investment now than they used to. The more initial investment, the more negative the effect of rejection.

WikiPride -- declerambaul A visualization method is presented that can be used to analyze trends for any cohort centric statistic. This visualization method is then used to show contributions of editor cohorts and how those contributions compare to previous years' cohorts.

Editor lifecycle -- Junkie.dolphin This research is looking at the evolution of contributors activity over the years by analyzing statistical regularities in collective patterns of editing activity

Lag between registration and first edit -- Junkie.dolphin, Staeiou About 30% of users register an account but do not perform their first edit immediately or within the same day. Our analysis shows that the time lag between registration and first edit can be weeks, months and even years long!

WikiPride -- declerambaul This sprint is intended to show how byte count can be used as an alternative to edit count as a measure of Wikipedian contributions, by measuring the total bytes added to different namespaces over time by different yearly cohorts of Wikipedians.

Trending articles and new editors -- Whym This research looks at the types of editors who start editing on very active or "trending" articles related to current news events compared with editors of less active pages (non-trending topics). The study found that trending articles did not attract any more new registered editors than average articles.

Wikiproject Participation & Mentorship -- jtmorgan, swalker This sprint explores the world of Wikiprojects, describing the joining and participating patterns of new and old editors.

Visualizing Wikiproject Activity -- jtmorgan, swalker In order to get a better sense of new user participation in Wikiprojects, as well as overall Wikiproject activity we have created a set of database tables that list various activity metrics for WikiProjects . We also conducted interviews with Wikiproject members in order to develop a set of design requirements for a proposed information visualization dashboard, WikiProject:Pulse.

One Link, Two Links, Red Links, Blue Links -- Staeiou This sprint explores the proportions of red- and blue-linked articles on English Wikipedia.

Editor classes -- Zackexley This sprint studies the changing recruitment numbers of new editors who will go on to become light, moderate, or heavy editors.

Data and code


Where they are comprised of public, freely-licensed Wikipedia data, we will be releasing the datasets used to complete our summer's work, as well as the code/queries used to produce them.