Research:Wikimedia Summer of Research 2011/Summary of Findings

From Meta, a Wikimedia project coordination wiki
Summer research in progress
The WSOR team presented these findings at the Wikimedia Foundation in August

This is the summary of findings from the Wikimedia Foundation Summer of Research, a program in the Foundation's Community Department which brought together eight academic researchers from around the globe to study the dynamics of the Wikipedia editing community.

The multidisciplinary group employed a variety of qualitative and quantitative methodologies to tackle some of the most pressing questions about current and historic editor trends. Guided by the results of the Editor Trends Study, the project primarily focused on barriers to participation by new Wikipedians.

This page is intended as a summary of some of their most important findings, as well as a guide to the documentation pages for each topic, where the research methods and conclusions are described in more detail. Feel free to discuss anything you might have questions or comments about on the Talk page.

Please note: unless otherwise stated below, the majority of our studies focused on English Wikipedia in order to scale our analyses to the largest dataset available first, as well as to augment our work with qualitative data. The tools and data used to perform the summer's work are freely licensed, and there is documentation available if you would like to replicate any part of the summer's work.

How new Wikipedians join[edit]

While the majority of our team's work during the summer focused on the activities of editors after they register and become new Wikipedians, some of our research did address anonymous contributions and the process of moving from registration to actively editing for the first time. The following projects cover these aspects of how new Wikipedians enter the system:

Anonymous contributions[edit]

(Documentation)

Question
  • How does the overall editing pattern of anonymous contributors compare to registered editors, especially new ones?
Key conclusion
  • Editing by anonymous users is declining faster than registered users (see charts), but as of today it still accounts for roughly a fifth of all edits on English Wikipedia.

Edits by anonymous users over time Edits by registered users over time

Registration and first edits[edit]

(Documentation)

Questions
  • How many people who register ever edit?
  • How long does it take new Wikipedians to engage with the site through their first edit?
Key conclusions
  • Only 30% of registered users on English Wikipedia have ever edited (see chart)
  • Most users who register an account and edit do so within an hour. (see chart)
  • There is another distinct cohort of users who wait months or even years before making their first edit. (see chart )

Average time between registration and first edit English Wikipedia registration and edit counts

Edits to trending articles[edit]

(Documentation)

Question
  • Are trending articles (i.e. those popular among readers, such as breaking news) an entry vector for new contributors?
Key conclusion
  • More than half of the edits to trending articles, whether they are semi-protected or not, come from experienced registered users and IP addresses with lots of prior editing history, not newly-registered editors or anonymous editors new to editing. (see charts)

Edits to trending articles Edits to trending articles, excluding those semi-protected

The ratio of red links to blue links[edit]

(Documentation)

Questions
  • Red links are one measurement of how much the encyclopedia is complete, and can entice new editors to contribute. What is the ratio of red to blue links, and has this been changing over time?
  • What topics have the most incoming links of either kind?
Key conclusions
  • There are still hundreds of thousands of red-linked articles, many of them linked from multiple pages in the encyclopedia: 95,000 red-links are linked 30 or more times. (see chart)
  • The most-wanted articles measured by incoming red links are on non-American history, culture, and biographies.

Red-linked articles on English Wikipedia by number of times they are linked on the site

How Wikipedians contribute to the encyclopedia over time[edit]

The summer's work also continued the kind of large scale quantitative analysis of the Editor Trends Study, but instead looking at how Wikipedians, especially new Wikipedians, contribute over their lifetime as editors. These analyses have given us a new look into who exactly is contributing to the different namespaces of Wikipedia and how their patterns ebb and flow over time:

Measuring contributions, not edits, by cohort[edit]

(Documentation)

Question
  • How do different cohorts add and remove content from articles?
Key conclusions
  • A large proportion of the content (measured in bytes or characters, not counting deleted articles) in the main namespace is added by editors who joined recently. (see chart)
  • Similar to the conclusions of the Editor Trends Study, it appears there is a natural growth and then slow tapering off of main namespace contributions by a cohort

Megabytes added to English Wikipedias main namespace, by cohort

Recruiting editors of different activity levels[edit]

(Documentation)

Question
  • How good a job is Wikipedia doing of recruiting editors who go on to make many edits in their first year as Wikipedians?
Key conclusions

(Note: This project has been extended beyond English Wikipedia. As the research progresses, we need your help to interpret the results for other languages)

  • The recruitment of light to moderate editors (those who make 1-9 and 10-99 edits in their first year of editing) is not declining as fast as recruitment of very active editors (those making 100 or more edits in their first year). (see charts)
  • The recruitment of very active editors (100 or more edits in their first year) peaked earlier than the recruitment of light to moderate editors (those who make 1-9 and 10-99 edits in their first year of editing). (see charts)

Recruitment of different editor classes over time as a percent of peak Recruitment of different editor classes over time, log scale

The lifecycle of editors[edit]

(Documentation)

Question
  • Are there identifiable patterns in the overall lifecycle of editing activity in the project?
Key conclusions
  • Editors go through a natural editing cycle -- a period of ramp-up, steady contribution, and eventual tapering off of activity. (see charts)
  • For heavy editors, the initial period is longer, but the middle phase of steady contribution is also much more lengthy and productive than for light editors. (see charts)
  • Very active contributors have generally remained unchanged in their patterns over the years, while low activity editors today contribute less than low activity editors from previous years

Activity rate over time for users whose first edit occurred on January, 2006, with activity rate equal to 1e-4 and 1e-5 edits/sec, respectively Activity over time for editors whose first edit occurred in January, 2006; editing activity bins a = 1e-6, 1e-7

Community interaction with new Wikipedians[edit]

To better understand the causes of the decline in retention of new Wikipedians, it was absolutely vital that we study closely the interactions between experienced editors and those new to the project. The following studies have helped us get a better understanding of how the current and historical dynamics of Wikipedia's community:

A new Wikipedian's first edit session[edit]

(Documentation)

Question
  • How does rejection of the first contributions by a new editor impact their participation?
Key conclusions
  • Having their contributions rejected (deleted or reverted) makes editors less likely to stick around on Wikipedia, independent of all other variables. (see chart)
  • Rejection tends to hurt highly invested new editors (those who make more edits in their first editing session) the most.
  • The rise of rejection correlates to the drop in new editor survival, and the two lines meet around 2007. (see chart)
  • New editors are demonstrating less initial investment (as measured by number of edits in their first editing session) in Wikipedia now than in the past.

Rejection and survival rates of new editors over time

Reverts and article length[edit]

(Documentation)

Question
  • We know the average article length is growing. How has this impacted new editors?
Key conclusions
  • Newbies are editing longer articles now than in the past. (see chart)
  • Over the entire history of Wikipedia, editing longer articles increases your likelihood of being reverted.

Length of pages edited by newbies, per year

The impact of being ignored[edit]

(Documentation)

Question
  • Does being ignored entirely make a difference for retention of new editors?
Key conclusions
  • Before 2006, new editors who spent a long time editing without receiving any messages on their talk page were less likely to stick around in Wikipedia. (see chart)
  • After 2006, this changed: new editors who received no talk page messages were more likely to continue editing longer than those who did. (see chart)
  • This change might be explained by the rise of template warnings around 2006. (see next set of findings)

Ignored period and new user retention

How we communicate with new Wikipedians[edit]

(Documentation: 1, 2, 3)

Question
  • How has communication to new editors (on their user talk pages) changed over time?
Key conclusions
  • Since 2004, there has been a significant drop in messages including praise and thanks, corresponding with an increase in the overlap of teaching/instructional communication with criticism.
  • Currently, about 80% of messages to new Wikipedians come from bots or semi-automated editing tools like Twinkle and Huggle. (see charts)
  • Currently, about 65% of the communications to new Wikipedians are warning templates on their talk pages. (see charts)

First messages to new users over time, proportional First messages to new users over time, raw numbers

(Documentation)

Question
  • How has the content added to the different namespaces of Wikipedia changed over time?
Key conclusions
  • From 2006 onward, there was a tremendous increase in bytes added to the User Talk space. (see chart)
  • Template warnings were created in 2006 and rose steadily in number over the years, and they have contributed heavily to this byte increase in user talk.

Bytes added to English Wikipedia by namespace

(Documentation: 1, 2)

Question
  • Can rewriting templates to be more personalized and teach editors about the community have an impact on their retention and the quality of contributions in the future?
Key conclusions
  • Rhetorical analysis of the most commonly used welcome templates shows that, while their appearance has changed significantly over time, their style and content has not.
  • Virtually all templates, both welcoming and warning, are written in passive, institutionalized language that appears highly impersonal to new editors.
  • Changing the language of the standard Huggle warning template to be more personalized dramatically changed the editing patterns of new users who received a warning.
  • Blatant vandals who received a personalized or teaching warning were less likely to continue vandalizing.
  • Blatant vandals who received a personalized warning and contacted the user who warned them were more likely to ask constructive questions.
  • About 10% of editors who were reverted and warned for vandalism were making encyclopedic, good-faith edits.

Barchart of the proportion of editors who make good contact with the reverting editor after receiving various messages as part of the huggle experiment. Barchart of the proportion of editors who continue editing after receiving various messages as part of the huggle experiment.

The impact of deletion[edit]

Speedy deletion (CSD)[edit]

(Documentation)

Question
  • How does speedy deletion impact new editors?
Key conclusions
  • Speedy deletion accounts for about 60% of deletion
  • The average time for a newly created article to be tagged for speedy deletion is 2 minutes.
  • The average article tagged for CSD is deleted in half an hour.
  • Most articles that get tagged for speedy deletion, about 37%, are tagged A7: not notable.
  • The next highest CSD tag (G11: Unambiguous advertising or promotion) accounts for only 8% of all CSDs.

Articles for Deletion (AfD)[edit]

(Documentation: 1, 2, 3)

Question
  • How do Articles for Deletion discussions and nomination impact new editors?
Key conclusions
  • Despite an increase in notifications, the vast majority of article creators do not participate in Articles for Deletion discussions about articles they started (see chart)

AFD participation by article creators

New Page Patrol[edit]

(Documentation)

Question
  • Who does the majority of work in New Page Patrol?
  • Has the workload of individual New Page Patrollers increased or decreased over time?
Key conclusions
  • Patroller work on Wikipedia follows a power law curve, meaning a small number of people do a significant majority of the work.
  • The workload of patrollers (both the average number of patrolled pages per year and per month) has been decreasing since 2007, measured by patrolling actions in the logs.
  • Since 2008, about 30% of the top patrollers are bots.

Workload for the top 50 new page patrollers over time

Vandal-fighters[edit]

(Documentation)

Question
  • How can we identify very active vandal-fighters?
  • Has the workload of individual vandal-fighters increased or decreased over time?
Key conclusions
  • As with patrolling, vandal-fighting follows a power law curve.
  • The workload of vandal-fighters is also decreasing. (see chart)
  • This could be related to the fact that vandalism overall has been decreasing since 2007, as well as because bots such as ClueBot and XLinkBot are doing a large amount of work today.

Number of English Wikipedia editors who revert at least 5 revisions for vandalism per month.

How new Wikipedians learn and collaborate in community spaces[edit]

Last but not least, we took some time this summer to study how new Wikipedians become a part of community activities outside of articles and their associated discussion pages:

The Project namespace[edit]

(Documentation)

Question
  • How much content do new Wikipedians contribute to community spaces (outside the main namespace)?
Key conclusions
  • Relatively few new Wikipedians add content to the Project (i.e. Wikipedia) namespace and its associated Talk pages (which include policy or guidelines, WikiProjects, and other maintenance material).

Megabytes added to English Wikipedia namespace four and five

(Documentation)

Question
  • How much do new Wikipedians participate in key community spaces outside the main namespace?
Key conclusions
  • New Wikipedians are participating in all community spaces less and less. From one randomized sample, we found about 5% of new users did so in 2008 as compared to 20% in 2004.

New user participation in different community spaces over time

Help requests[edit]

(Documentation: 1,2)

Question
  • How and where do new Wikipedians request assistance?
Key conclusions
  • New editors do not often seek out help on-wiki.
  • When they do ask for help, they do not use traditional help spaces: most help requests seem to happen in Talk and User talk namespaces.

Locations of help requests from new users

(Documentation: 1,2)

Question
  • What do new Wikipedians ask for help with?
Key conclusions
  • The most common type of help requested by new editors is about editorial policy.
  • Technical editing questions are the second-most common.
  • Requests like asking for tasks or inquiring about behavior policy are very rare.

Help request topics

WikiProjects[edit]

(Documentation: 1, 2)

Questions
  • How do Wikipedians indicate their membership in WikiProjects?
  • How has membership in WikiProjects changed over time?
  • Do new Wikipedians tend to join WikiProjects?
  • Which WikiProjects are best at attracting active members, especially new Wikipedians?
  • What impact does WikiProject membership have on contributions by both new and experienced Wikipedians?
Key conclusions
  • While some WikiProjects continue to thrive, many have ceased to be active as a tool for coordinating activity.
  • The number of Wikipedians joining WikiProjects at any stage of their lifecycle as editors is declining. (see chart)
  • Most people who join WikiProjects are not newbies.
  • New Wikipedians who join a WikiProject early in their editing lifecycle go on to make more edits overall.

Graph with trendlines showing number of newbies (<=100 edits), Wikipedians (>100 edits) and all editors (newbies + Wikipedians) joining all Wikiprojects, 2005-2011. Current top 20 Wikiprojects by new members