Research:Content consumption metrics
|This page is currently a draft. More information pertaining to this may be available on the talk page.
Translation admins: Normally, drafts should not be marked for translation.
This page provides high-level documentation of the Wikimedia content consumption metrics; how we measure and examine behaviour related to reading Wikimedia sites and consuming our content. It is currently a placeholder while we build said metrics, although it will be expanded as we do so.
Metrics and areas
Page views are requests for Wikimedia assets that can be counted as a single, human-driven request for a self-contained piece of text-based content. What this means in practise, and how we mesh our infrastructure with what we're trying to measure, is still being defined.
Generally speaking the process that will be followed consists of taking all requests, applying a set of generalised filters that limit them to "single requests for self-contained pieces of text-based content", and then tagging each request as to whether it is from a web crawler or other automated library, the Wikimedia apps, part of the Zero programme, the mobile website, or else the desktop website.
This documentation is currently a draft: we are actively discussing what heuristics are used at each stage of the filtering and tagging process (feedback and community thoughts are appreciated!) After we have reached a satisfactory outcome, the focus will move to implementing the definition, testing it in production, and providing the output to interested Foundation and community parties.
A key metric to be able to track is unique clients; how many distinct devices we have visiting our properties in a given time period. Depending on the implementation, this also has implications for how we implement session analysis metrics.
Obviously this raises fairly big privacy concerns. As such, our current work consists of documenting the use cases for unique client tracking, along with the various possible ways of implementing it that we can think of, with their advantages and disadvantages (privacy included in that calculation). Once we have done that and come up with a way we are comfortable handling the problem, we'll hold a discussion on the subject.