Research:Metrics

From Meta, a Wikimedia project coordination wiki

Aims and scope[edit]

The purpose of this document is to:

  • define a canonical set of metrics used by the Wikimedia Foundation to measure the impact of editor engagement projects.
  • standardize the definition of these metrics so they can be applied unambiguously across projects and experiments
  • implement code to compute these metrics for any arbitrary set of users
  • cross-reference metrics and projects that use them.

Scope[edit]

The scope of this document is limited to user-level metrics. All the metrics listed on this page are primarily focused on measuring the activity of new registered users. Metrics for readers or experienced editors fall outside the scope of the current proposal, even if some of the present metrics may apply more broadly to any user with a persistent identifier. Revision-level, article-level and project-level metrics fall outside the scope of this proposal. User-level metrics are computed for individual users, but they are typically used in aggregate to compare different user groups or cohorts against each other. Aggregates for boolean metrics are typically expressed as proportions, aggregates for non-boolean metrics are typically expressed as means, medians or other properties of their distribution within a cohort. Unless otherwise stated, user tags and metrics are defined on a per-project basis, i.e. they do not take into account user contributions across multiple projects.

Overview[edit]

This document consists of 4 sections:

user classes
user classes define populations of users sharing some properties. User classes can be used for reporting purposes (for example: over the last month the active editor population increased by 3%), as the target of an experiment (for example: experiment A increased by 10% the proportion of live accounts, i.e. users who clicked at least once on the edit button) or as groups of users that need to be excluded by the analysis (for example: bots and blocked users were excluded from the sample of participants in experiment B)
user metrics
metrics define dimensions along which the activity and participation of a user or group of users can be assessed (for example: an editor's lifetime edit count)
user tags
tags are boolean attributes that determine whether or not a user has a given property, undergoes a specific treatment or belongs to a given experimental condition (for example: editor 123 was reverted). Some of these tags can be used to define user classes (for example: is a bot). Some of these tags are computed tags can be obtained by fixing one or more parameters in a metric definition. For example: has 1+ edits in the main namespace of a project within the first 24 hours of account registration" is a tag that is true or false for any given registered user, regardless of when it's generated. Other tags are assigned as part of a treatment or experiment.
funnel metrics
this section defines key metrics and terminology used when analyzing a conversion funnel.

User classes[edit]

The purpose of this list is to centralize various definitions that have been used internally and externally for describing categories of Wikipedians, before we start pruning the list. As a general rule, tags or treatments that refer to specific features should go in the tags section below. All definitions need to sharpened to become unambiguous and translatable into a formal definition.

User classes
Generic Unique visitor needs definition
Unique visitor (mobile) needs definition
Unique token A user who is assigned an anonymous MediaWiki token persisting across browser sessions.
Registered Registered user A user with a username and a unique user_id registered with a Wikimedia project.
Attached user A globally registered user signing into a new project but having a pre-existing user account associated with another "home" wiki.
Live user A registered user who clicks at least once on the edit button (whether or not the edit is actually completed) on the main namespace.
Editing Editor A user with 1+ lifetime edits on the main namespace of a given project.
Editor (mobile) A user with 1+ lifetime mobile edits on the main namespace of a given project (requires revtagging)
Active editor A user with 5+ edits in the main namespace of a given project over the last 30 days.
Very active editor A user with 100+ edits in the main namespace of a given project over the last 30 days.
Anonymous editor A user identified by a public IP address stored with a MediaWiki revision with 1+ lifetime edits on the main namespace.
Special Media uploader needs definition (Registered user with 1+ Commons upload).
Media uploader (mobile) needs definition (Registered user with 1+ Commons upload from a mobile device, requires revtagging).
Power editor needs definition (Registered user editing with power maintenance tools, such as Twinkle or Huggle, requires revtagging).
Bot needs definition (Registered user performing scripted activity, requires usertagging).
Blocked user needs definition.
Banned user needs definition.
Deprecated user classes
Editing Newbie editor 0-10 lifetime edits (This class has been occasionally used in the past but it should be deprecated as it spans two very different populations of users, editing and non-editing).
New Wikipedian 10+ lifetime edits (This class is not used for measuring currently active editors and can be downgraded to a simple tag if needed).

User metrics[edit]

User metrics can be characterized as functions that take as input a user_id and a number of additional parameters and return a numeric or boolean value for that user.

Retention[edit]

Retention metrics allow us to measure whether an editor is active, and to what extent, during or after a given timespan

survival(t)
Boolean measure of an editor retention. Editors are considered as "retained" or "surviving" if they continue to edit after a time since a reference event (typically: account registration time).
threshold(t,n)
Boolean measure of an editor's level of activity. This metric measures if an editor reaches some threshold of activity (e.g. edits, words added, pages created) within time of a given reference event (typically: account registration time).
live_account(t)
Boolean measure of whether an account is considered active. This metric measures whether the first click on the edit button on any article occurs within a time, , since registration for a given account.

Volume of contribution[edit]

Volume of contribution metrics allow us to measure the quantity of an editor's wiki work

edit count(t)
The total number of edits an editor has performed in their career in a period of time since registration.
edit rate
The rate at which an editor makes/saves revisions during a given timespan .
time to threshold
The amount of time it takes an editor to reach a certain number of edits since registration.
edit sessions
A period of time during which an editor is making a sustained series of revisions before leaving.
bytes added
The number of bytes added, removed or changed by a registered user during a given timespan .

Quality of contribution[edit]

Quality of contribution metrics allow us to measure the quality of an editor's wiki work

revert rate
The proportion of revisions made by an editor that were reverted by another editor within a given time .
content persistence
A measure of the survival (over time or through revisions) of content added, used to determine quality under the assumption that content which lasts longer does so due to its quality.
block
A boolean measure indicating that a user has been blocked.
ban
A boolean measure indicating that a user has been banned.
warning
A boolean measure indicating whether a user has received any kind of warning.
qualitative assessment
The quality of an editor's contributions as determined by human raters.

Type of contribution[edit]

Type of contribution metrics allow us to measure the diversity of an editor's wiki work

scale of change
The amount of a page's content changed by an edit.
namespace of edits
In Wikipedia namespaces represent different areas of activity (article contributions, article discussions, user communication, policy or meta-level discussions); therefore, edits to different namespaces represent different types of work.


User Metrics API (UMAPI)[edit]

Start page of UMAPI.

The metrics API is a project whose aim is to create a way to extract the above metrics in a predictable, consistent, and reproducible way. The code base can be found here and is also pushed to a Gerrit project. This project will soon be replicated to the Wikimedia Github account and is currently deployed on the Wikimedia stats cluster. Documentation for the source is also hosted on the stats cluster here. The domain of the API is Wikipedia projects (the current focus is on English WP and is currently being expanded).

Read more on mediawiki.org.

User tags[edit]

Tags are boolean flags that can be assigned or computed for any registered user identified via a unique user_id. Tags can be combined to define arbitrary cohorts that we want to compare or for project-level reporting.

Computed tags[edit]

Computed tags are derived by fixing one or more parameters from user metrics. These tags allow us to make rapid assessment of what proportion of a given cohort meets specific criteria (how many users from the experimental condition X completed an edit within the first 24 hours). They can also be applied as filters in a cohort definition (for example: select all participants in experiment Y who clicked at least once on the edit button within the first hour of account registration)). The following is a non-exclusive list of computed tags that can be derived from the above metrics.

_editor
True if a user has completed 1+ lifetime edits per project across all namespaces to date
_mobile_editor
True if a user has completed 1+ lifetime edits on a mobile device per project across all namespaces to date
_ns0_editor
True if a user has completed 1+ lifetime edits in a project's main namespace (0) to date
_1h_ns0_editor
True if a user has completed 1+ edits in a project's main namespace (0) within 1 hour since account registration
_1d_ns0_editor
True if a user has completed 1+ edits in a project's main namespace (0) within 1 day since account registration
_7d_ns0_editor
True if a user has completed 1+ edits in a project's main namespace (0) within 1 week since account registration
_ns0_reverted
True if a user has completed 1+ edits in a project's main namespace (0) and at least one of these edits got reverted to date
_1h_ns0_reverted
True if a user has completed 1+ edits in a project's main namespace (0) and at least one of these edits got reverted within the first hour since account registration
_1d_ns0_reverted
True if a user has completed 1+ edits in a project's main namespace (0) and at least one of these edits got reverted within the first day since account registration
_7d_ns0_reverted
True if a user has completed 1+ edits in a project's main namespace (0) and at least one of these edits got reverted within the first week since account registration
_ns0_creator
True if a user has created 1+ page per project in the main namespace to date (whether or not the page was deleted)
_deleted
True if a user has created 1+ page per project in the main namespace and at least one page was deleted to date
_warned
True if a user has received any warning to date
_blocked
True if a user has received a block to date
_talks;
True if the user has posted any message on another editor's talk page to date;
_talked_to;
True if the user has received any message from another editor on her talk page to date;

Assigned tags[edit]

Assigned tags cannot be computed from user metrics, they are simply assigned to users depending on treatments they receive. Account registration campaigns or buckets in experimental treatments (such as MoodBar, WikiLove, PageCuration etc.) are all examples of assigned tags.

Funnel metrics[edit]

This section defines standard terms used for analyzing funnels in the context of product analytics, editor engagement experiments and A/B tests.

A diagram defining basic funnel metrics and terms
impression (IMP)
An impression is an event logged each time a given element (a form, a banner) or a whole page is visibly displayed to a user.
click-through rate (CTR)
The clickthrough rate is defined as the number of clicks on a clickable element on a given node in a funnel divided by the number of impressions of the same element.
Example: the number of clicks on a call-to-action divided by the number of impressions is defined as the click-through rate of the call-to-action.
bounce rate (BNC)
The bounce rate for a given node in a funnel is the complement of the clickthrough rate, i.e. the proportion of total impression of a given element that is not followed by clicks or visits to the following node in the funnel.
completion rate (CMP)
The completion rate of a (sub)funnel is defined as the number of completion events divided by the number of impression of the immediate entry point of a (sub)funnel.
Example: The number of successfully saved edits divided by the number of impression of the edit screen of an article can be defined as the completion rate for the edit funnel.
conversion rate (CNV)
The conversion rate of a funnel is defined as the completion rate calculated on the entire funnel (as opposed to a specific segment of the funnel), i.e. the number or completion events divided by the number of impressions of the initial entry point of the funnel.
Example: The number of successfully saved edits divided by the number of impressions of an article can be defined as the conversion rate for the edit funnel.

Raw vs unique funnel metrics[edit]

Unless otherwise specified, the above metrics are calculated using raw events (i.e. without deduplicating multiple events generated by the same user tokens). Funnel metrics can also be computed for unique tokens, by discounting multiple events associated with the same token and removing orphaned events (events with no corresponding parent event with a matching token). When this happens, we will refer to these metrics as unique impressions, unique click-through rate etc.

Time-dependent funnel metrics[edit]

Funnel metrics can be calculated through the end of a test or capped at a given timespan since a reference event in the funnel, in which case they need to be explicitly defined with a time parameter. For example: 1h unique CTR represents the number of deduplicated clicks that happen within an hour of the impression, divided by the total number of unique impressions with a matching token.

Additional material[edit]