Research:Metrics/survival(t)

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

Rationale[edit]

\mathrm{survival}\left(t\right) is a boolean metric that is true when an editor participate to Wikipedia for more than t days. The time span t is meant from a reference event t_0, which can be the time of joining Wikipedia or any other relevant milestone, for example the first click on the “Edit” button. Thus the full implicit denomination for this metric is \mathrm{survival}\left(t_0, t\right). When t_0 indicates the joining time, we omit it and just write \mathrm{survival}\left(t\right).

This metric has commonly been used to measure the rate of retention of new editors when examining the "decline" of participation.[1][2][3]

Value of parameter t[edit]

Ortega analyzed the span of activity of editors in several Wikipedias and found that the median span of activity was 100 days, meaning that 50% of all editors had a retention period not longer than 100 days.[4]However, he also found that the risk of withdrawal from activity is highest within the first 12 days.[5]. Thus a value of t = 12 days could be a good choice for identify editors who successfully pass the early phase of high dropout rates.

Pros/Cons[edit]

Pros:

  • Easy to compute: just look if there are any edits with timestamp t_\mathrm{edit} > t
  • Binary (True/False) nature makes it easy to understand.

Cons:

  • Can only be measured retrospectively, and is not robust against wikibreaks and other long periods of inactivity.

Technical dependencies[edit]

Relevant tables to query:

  • Revision

alternatively:

For most users, the timestamp of account registration (either the local or the global one, see the CentralAuth extension) can be taken as the joining time, even though this assumption does not account for any time the user may have spent lurking or editing as IP before registering the account.

Communication[edit]

The term “survival” originates from the medical literature of clinical trial studies. (see en:survival analysis). There, the parameter t indicates the span of time free of follow-up events. In our case, we can only measure editor retention looking at the patterns of contributions, thus our definition of “survival” depends on the occurrence of a contribution event past the threshold t.

References[edit]

  1. Aaron Halfaker (2011). First edit session, a technical document produced for the 2011 Wikimedia Summer of Research.
  2. Aaron Halfaker, R. Stuart Gieger, Jonathan Morgan & John Riedl. (in-press). The Rise and Decline of an Open Collaboration System: How Wikipedia's reaction to sudden popularity is causing its decline, American Behavioral Scientist.
  3. Ortega, F. Wikipedia: A Quantitative Analysis PhD Thesis, Universidad Rey Juan Carlos, 2009. See p. 118.
  4. Ibidem, see p. 120.