Research:Metrics/Brainstorming

This page in a nutshell: This is the original brainstorming document upon which the metrics report is based. It is reported for archival reasons only. Please refer to the main document.

Engagement and retention Metrics[edit]

Dario: goals for first iteration

Scope: what is covered / what not (for example, reader engagement metrics is for later)
Aim: distil up to 5 (?) key user engagement metrics, focus on new registered users.
- engagement (volume of raw contributions)
- retention (editor survival, edit rate decay)
- work quality/productivity (revert rate, text survival metrics, quality rating)
- type of contribution/diversity (taxonomy, size of contributions, cross-namespace contributions)
For each candidate metric:
- Rationale (What does this metric measure? How does it work? How do we know it is appropriate?)
- Pros/Cons of using it as opposed to other metrics
- Technical dependencies (how complex it is to extract these metrics? Do we need supplementary data sources on top of the data available in the MW database)
- Optional: communication (overheads in explaining what these metrics stand for to a lay audience)
- Optional: refs in the literature

Motivation & Justification[edit]

(Ryan F.) Having well defined metrics can help us:

Measure the effect over time due to new feature implementations and due to experimentation
Assign value to product features and experiments in the backlog. This will aid prioritization of features as well as experiments.
Brainstorm new feature ideas and experiments
Gain an understanding about which metrics correlate most with the growth/decline of the community (the health of Wikipedia) - there is an implicit premise here that we will be able to "define" the strength of the community via a set of metrics

Broadly, what are we trying to measure?[edit]

Here, rather than define directly quantifiable metrics it may help to come up with some higher level descriptions that can help us brainstorm new low level metrics and to provide some structure to our metric definitions. Some of these metric "categories" may be:

editor retention
editor contributions (edits - active editors, different categorization based on edit counts and patterns)
editor contributions (other - click through rates, account registration, article feedback etc.)
edit quality (edit surivability)

Independent variables against which to measure metrics:

time unit (simplest and most common)
edit count (range)
article category
registration date

Infrastructure[edit]

datasources
- slave dbs
- squid logs
building and storing the metrics
- python - we'll want to have an extensible code base here for building metrics
- cron jobs
how will we report these metrics
- dashboards
- data visualization
  - d3.js (time-series)
- report card (utilizing david S.'s libraries?)
- served on apache
where will this live?
- stat1 - metrics generation
- stat2 - hosted reporting and visualization

Resource allocation[edit]

- Aaron, Giovanni, Dario, Ryan F.

Nomenclature (from least to most active user classes)[edit]

(may need to include readers, see below)

Registered user
Active user: registered user who has *visited* an edit page once ^[1]^[2], aka “live user” ^[3]
- note that users may be ambiguous if it's referred (in some contexts) to readers
Users who performed at least one edit a day ^[4]^[5]
- how to call this class? contributing editor, contributor? editing user? ^[3]
New wikipedian: editor who has performed 10 edits ^[6]
Active editor: > 5 edits/month ^[6]
Very active editor: > 100 edits/month ^[6]

Metrics[edit]

Global[edit]

Number of users by month/week/day:

Nr. of registrations ^[3]
Nr. of active users ^[6]
Nr. of editing users (see above) ^[3]
Nr. of new wikipedians ^[6]
Nr. of active editors ^[6]
Nr. of very active editors ^[6]
Nr. of sessions
Nr. of edits per session
Nr. of "notifiable" users with a verified email address ^[7]
Nr. of active readers (identified by the volume of feedback submitters with a unique identifier (user_id or IP address), we can't get uniques unless we rely on anon tokens, we have persistent anon tokens for AFT4, not for AFT5)

Readers (We haven't thought through reader engagement metrics bu we will have to at some point, also this might be relevant for proto-users or power-readers or whatever we want to call them)

New users[edit]

(I recommend we have separate set of retention metrics for newly registered users who may not have completed their first edit)

Time to first edit click (after how long a registered editor becomes active?)
Time to first edit (after how long a registered editor completes his/her first edit?) ^[8]
- both are instances of the time to milestone metrics family, see ^[9]
Number of non-edit modifications to the site (watchlisting, preferences) as tracked by user.user_touched.
- needs to be sampled at regular intervals (maybe monitoring db replication UPDATEs via triggers)
Email authentication (has the user inserted an email address at registration time? has the user authenticated it?)
Time to email authentication (after how log a registered editor authenticates his/her email address?)

Editors[edit]

(I imagine we want to restrict these metrics to registered only at least in a first phase? A subset may also apply to anonymous editors whose persistent activity we can obtain via clicktracking logs)

Cumulative edit count ^[6]^[1]^[10]
Daily edit count, aka editing activity ^[6]^[4]
- activity is the inverse of the average time elapsed between two consecutive edits
- we should also include edit rate per session, Aaron has worked on that
- we need to clarify how we deal with editors using power-editing tools
Edit delta
- differences in editing behaviour before and after an event (e.g. micro-task, edit feedback, barnstar)
- a related metric was measured in the latest round of Huggle and Twinkle experiments (https://meta.wikimedia.org/wiki/Template_A/B_testing/Results)

Retention[edit]

see Ryan's suggestion of using a k-n retention metric, for editors making at least k edit(s) at minimum n days ^[9].
Time to milestone, see ^[9]
Productivity (or maybe "quality" to avoid confusion with edit
1. Revert rate within a given time window
2. Binary variable (based on criteria such as: at least one revision that didn't get reverted in the first week, see ^[11])
We need to distinguish between metrics that are measured over a limited period of time (i.e. an observation window), and metrics who don't. k-n retention seems to fall in the former
Average editing rate per session
- How is a session defined?
Average session length (in minutes)
- Ditto
Edit metadata:
- Tool used (normal interface, power-editing tool, etc., API)
- Session id (if required?)
- User agent
- The usual metadata (Namespace, etc)
- Edit summary (amount of text inserted, removed, etc.)

Groups (cohorts or treatments)[edit]

(These metrics should include all of the global metrics above (at least those which can be measured at cohort or treatment group level. Note that at some point we may also want to consider WikiProject-level metrics, or reuse and adapt any metrics that Global Dev is developing for outreach events or Global education to measure group-level productivity)

Cohort retention ^[6]: percentage of editors who joined in a given month, who are still active (= made at least one edit) at the time of measuring (see individual retention above)
- we should have more sophisticated metrics for groups than just means averaged from individual metrics as many of these metrics will not be normally distributed

Articles[edit]

(This may not be within the scope of editor engagement, but we may have to include some article-level metrics for some of our analyses)

Macro - Categorization

Data sources[edit]

revision timestamps (revision.rev_timestamp)
daily contributions (user_daily_contribs.contribs)
- data only from 2011 (?)
- does it capture actions other than revisions?
user_touched: “the last time a user made a change on the site, including logins, changes to pages (any namespace), watchlistings, and preference changes” ^[12]
- does not update upon logins based on cookies
- does not only measure contribution: is it any good for integration with other data?
first click on the edit button (edit_page_tracking.ept_timestamp)
- data only available from July 2011

References & Notes[edit]

↑ ^a ^b Observational study on MoodBar users.
↑ Data collected by the mw:Extension:EditPageTracking extension.
↑ ^a ^b ^c ^d New user registrations dashboard.
↑ ^a ^b Editor lifecycle study.
↑ Wikipedia: a quantitative analysis, http://libresoft.es/publications/thesis-jfelipe
↑ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j Editor trends study.
↑ Feedback Dashboard notifications dashboard.
↑ Lag between registration and first edit.
↑ ^a ^b ^c Post-Edit feedback experiment, overall metrics.
↑ Experimental Study of Informal Rewards in Peer Production.
↑ Converting readers into editors: New results from Article Feedback v5.
↑ Mediawiki manual, user table.

[obs-1] Observational study on MoodBar users.

[2] Data collected by the mw:Extension:EditPageTracking extension.

[reg2-3] New user registrations dashboard.

[lifecycle-4] Editor lifecycle study.

[Ortega-5] Wikipedia: a quantitative analysis, http://libresoft.es/publications/thesis-jfelipe

[editor_trends-6] ↑ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j Editor trends study.

[fd_notify-7] Feedback Dashboard notifications dashboard.

[reg_lag-8] Lag between registration and first edit.

[PEF-9] Post-Edit feedback experiment, overall metrics.

[plos-10] Experimental Study of Informal Rewards in Peer Production.

[AFTv5-11] Converting readers into editors: New results from Article Feedback v5.

[user_touched-12] Mediawiki manual, user table.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]