Research:Unique Devices/Use Cases
Unique devices as a high-level metric
The number of "unique users" or "unique clients" hitting a site is a common high-level metric in web analytics. Interest in having it available for Wikimedia sites has been expressed by the Executive Director and various Department heads, along with people in Product Development.
This must allow us to answer the questions:
- How many unique [mobile web/mobile app/desktop] clients saw a specific project in a day/week/month?
- How many unique [mobile web/mobile app/desktop] clients saw any Wikimedia project in a day/week/month?
- How many unique [mobile web/mobile app/desktop] clients saw any project within a Wikimedia domain (e.g., "any Wikisource") in a day/week/month?
A token that would allow us to do this has to be:
- Usable through the app, the mobile site and the desktop site;
- Last for at least 32 days, to allow for "month" questions;
- Cover all users.
Within Growth, Mobile, Fundraising, etc, there are recurring needs to run controlled experiments on the site. The major requirements for an experiment are:
- randomization: users are randomly assigned to a treatment at the start of every test
- persistence: users receive the same treatment for the duration of the experiments
A very elegant way to achieve these requirements is to assign users to treatments by hashing a unique token along with an experiment specific salt (see PlanOut as an example). For simple experiments, its is often possible to meet these requirements by using a new set of cookies for each new type of experiment ( the downside is a lot of unnecessarily repeated effort). The system fundraising is using currently does not satisfy either of these requirements and a solution using cookies is looking very difficult to achieve.
If we only care about persistence for the duration of a browser session, we could use the browser session cookie. The downside is that there already exist use cases in the mobile team that require the experiment to last for 90 days.
How often do people visit Wikipedia?
Currently we run high intensity month-long banner campaigns. To spread the workload more evenly across the year and ensure a more continuous revenue stream, we would like to move away from this model. Ideally, there would be low intensity campaigns running throughout the year. Currently, when a campaign runs in a country, a banner is shown on virtually every page view. This is acceptable when campaigns are short and occur only once a year. When campaigns run all year, we need to minimize the number of banners we show. Instead of showing a banner on every page view, we could, for example, show a banner on every nth view or every n weeks. In order to determine how often to display banners we need to understand how often clients use the site. Useful statistics include:
- the distribution over number of pages viewed and sessions per day/week/month/year
- the distribution over times between sessions
- the distribution of times between new page request within sessions
- all of the above broken down by device, project, language, country
- all of the above at different points/windows of time to be able to see trends/changes in distributions
The proposed cookie for storing qualified last visit times does help answer any of the above questions. Using unique tokens would allow us to answer the above questions.
For every banner impression a client sees, we would like to know the sequence of banner impressions that preceded it. Without this, we must treat all banner impressions as identical and without history. We cannot answer very simple questions like:
- How many banners do people see a during a campaign?
- How does the probability of donating change as a function of the number of previous impressions over the last n days?
- Is one sequence of banners better than another?
- When we ran an AB test, where groups A and B comparable in terms history of impressions before we applied the treatment?
Again, the proposed cookie for storing qualified last visit times does help answer any of the above questions. Using unique tokens would allow us to answer the above questions, however, this information could be gathered without unique tokens. A proposal for doing so is described here. The general idea is that every impression sends back information about some history of the impressions that came before it. Since impressions come from Central Notice and impression histories are not unique there is no private client data being stored. However the solution requires a very large cookie and considerable changes to Central Notice. There are also concerns related to caching and increased server load. It is unsatisfactory that a large amount of redundant information gets sent back to our servers on every single pageview.
A large number of consumption metrics relate to or depend on session analysis; these include:
- How many pages do people tend to view in a single session?
- How long do people spend on each page?
- How long do sessions last?
To answer any of these questions, we need to be able to divide user activity into "sessions"; unbroken periods of site access. This requires a unique identifying token allowing us to extract the timestamps of requests, grouped by the client that made those requests. As such, it depends on unique clients. The requirements for a unique client token that would be acceptable for session analysis are that the token:
- Be passed with all requests, or all "pageviews";
- Last at least 32 days, to allow for month-long answers to the session analysis questions;
- Cover a sufficiently representative population of clients to allow us to draw general conclusions about reader behaviour, even when broken down to [just mobile devices]/[just mobile site hits]/[just hits from the United States].