Research:Trending articles and new editors

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Nutshell.png This page in a nutshell: This research looks at the types of editors who start editing on very active or "trending" articles related to current news events compared with editors of less active pages (non-trending topics). The study found that trending articles did not attract any more new registered editors than average articles.
Current event marker.svg This project page documents a research project currently in progress.
Information may be incomplete and can change rapidly as science advances.
Research project
Trending articles and new editors
main contact
start 2011-6
status in progress Icon 66 percent.png
fields computer science
human–computer interaction
statistics
social computing
open data This project has published open-licensed data
open access This project has open access publications
WMF support
Wikimedia research projects Wikimedia research projects

When a Wikipedia article or more likely its subject, gets high attention, more people than usual start editing Wikipedia. I will be analyzing how different is the behavior of those editors who started contributing to Wikipedia by editing a trending article, from those who started with non-trending articles.

Contents

[edit] Topic

  • Is it less or more likely for an editor to be retained if he/she started contributing to Wikipedia by editing a trending article? (RQ3)

[edit] Process

  1. Gather bursts in page views using the Wikistats' hourly page view counts [1], and store them in the tuples of (article title, revision number, time).
  2. Extract a new editor table that contains every editor's first revision from the Wikilytics datasets.
  3. Classify the new editor set into 'trending' and 'non-trending', checking if their first revisions fall into the burst set.
  4. Analyze differences of 'trending' and 'non-trending', in terms of retention, total number of edits, etc.

[edit] Results and discussion

[edit] Preliminary results

I extracted the edits made in trending articles when they were trending, and calculated the percentages of new registered editors, new IP users, IP users with many contributions, and others. Below is the table to show the difference in the distributions of edits in trending articles and all [2] articles.

The results indicate that trending articles did not attract many new registered editors. The number of edits made by new registered editors was actually lower than that number in averaged case.

As an additional observation, I found that trending articles edited more than 30 times frequently than usual. Although the increase is most clearly seen in the number of 'new' IP users , [3] the distribution of new/old and registered/non-registered users did not change largely. Note that there is a large percentage of edits made by old registered editors in trending articles, despite the fact that the chart above even excluded any edits made when the article was semiprotected (i.e., when the article can be edited only by registered users with a certain edit history length).

[edit] Definitions

  • Trending : An article is trending when its page view count in the last hour surpasses (3 * (linear-fitting prediction of the page view based on the record of the previous 2 hours)).
  • New: An edit is counted as a new editor's edit if the edit made within 30 days since the editor's first edit.

[edit] Datasets

  • Trending table summarizes edits made in trending articles with the following columns:
    • title
    • page_id
    • redirect?
    • pageview timestamp (in date and hour)
    • predicted pageview
    • actual pageview
    • trending hours (the duration of the continued trending hours)
    • surprisedness (percentage of the increase from the prediction to the actual page view count)
    • revision
    • revision timestamp (in date, hour, min and seconds)
    • user type (registered user, bot, anonymous user)
    • username
    • editcount (editcount until the revision timestamp)
    • new user? (whether the user had 30 days editing history as of the revision)
  • Trending-and-nontrending table
    • has the same columns as the above Trending table, except for 'predicted pageview', 'surprisedness' and 'trending hours' filled with a dummy value.

[edit] Future work

I will be analyzing if I can find special behaviors in new editors who joined Wikipedia by editing, by looking at page view counts earlier than 2010 and contribution histories until now. Although the number of new editors are not different whether their first articles are trending or not, the style of their future contributions might be different.

Toolserver keeps old page view counts in tswiki:User-store. This dataset enables us to explore how the editor participated trending articles were different in one year, in terms of activity (retention), contributing area etc.

[edit] See also

[edit] References

  1. Wikistats aggregates the page views of Wikipedia articles in an hourly basis for recent months.
  2. For computational efficiency, I examined the articles contained Domas's page view counts only, and discarded 99.99% at random.
  3. Note that some 'new' for IP users can have long editing experience, but cannot be seen as 'old' editors because of IP address change.
Personal tools
Namespaces

Variants
Actions
Navigation
Community
Beyond the Web
Toolbox