Jump to content

Research:Unique devices

From Meta, a Wikimedia project coordination wiki
(Redirected from Research:Unique clients)

A key content consumption metric is unique devices; how many distinct devices we have visiting our web properties in a given time period. Depending on the implementation, this also has implications for how we implement session analysis metrics.

Obviously this raises fairly big privacy concerns. The Analytics team counts unique devices per project per day and month in a way that does not uniquely identify, fingerprint or otherwise track users.

Definition

The number of "unique users" hitting a site is a common high-level metric in web analytics. In our case – given that we do not ask users to log in – we are using the counting of Unique Devices as a proxy to count users.

Differences between unique devices and unique users

Since you do not need to be logged in to use Wikipedia, we are restricted to count uniques using HTTP cookies. This means that if a user uses both their mobile phone and desktop to access Wikipedia, we are counting two "devices" (mobile and desktop). There is no way to count unique users using cookies if we do not ask our users to log in, regardless of the method.

Why the name unique devices

We play with the idea of publishing our numbers under "unique clients" now. The main objection we have to "unique clients" is that it feels different to IT people and non-IT users. Technical people would think of client as in "client-server" and think mostly of it as a software instance, even one app could create several clients in different threads. But non-technical people would think of "client" as a "person", e.g. a person visiting a store. That's what makes "client" so tricky as it means different things to different people. And all nuance gets lost. Just if we were to say "we mean 'client' as 'piece of software', and one can have several clients even on one device".

Our idea publishing these numbers under the "unique devices" label is that everyone will immediately grasp that one human can have several devices. Even when they probably don't grasp or remember the further breakdown into several software instances per device.

Can we count unique users instead of devices?

The answer is no, regardless of method. We could only do that if we ask everyone to login to use wiki projects. That said, Erik Zachte theorized that if we compute Unique Devices at the hourly granularity, it might be a closer estimate to Unique Users. Because people are unlikely to switch devices to access Wiki projects within the same hour.

Why are we doing this?

We used to depend on ComScore to produce this data, which came with some problems: the methodology for producing the data was not transparent, nor are the results, and it comes with some restrictions around data usage. Additionally, ComScore data is far fuzzier for mobile devices unless we install tracking beacons on our sites that stream data back to ComScore.[1] Developing our own way of tracking this allows for us to be certain about the meaning behind the numbers, track how the approach holds up in practice, and share the high-level numbers not just narrowly within the WMF (as we can now) but with the wider community.

Unique Visitor data from comScore (September 2014)

How do we count unique devices?

We use a very privacy-conscious way to count unique devices, it does not include any cookie by which your browser history can be tracked. A simple explanation can be found in our Wikimedia blog post on the topic.

Technical details

Technical details of the last access cookie implementation are available on wikitech.

Caveats

With this methodology we can count unique devices, but we cannot use the last access cookie to tag users belonging to different buckets of an A/B test, for example.

Dataset

Data for Unique Devices in Wikimedia projects is available in the following forms:

Downloads

In downloadable form here: http://dumps.wikimedia.org/other/unique_devices/

Databases

For users with access to Hive in form of two database tables called last_access_uniques_daily and last_access_uniques_monthly. See wikitech:Data Platform/Data Lake/Traffic/Unique Devices for the table schema. This also includes per-country data which is not exposed publicly due to privacy concerns.

API

Data can also be queried programmatically. See the Wikimedia Analytics API documentation.

Dashboard

Unique devices are part of Wikistats2, see for example: for monthly uniques for French Wikipedia. (Another visualization is available as part of the older Vital Signs dashboard: example for daily uniques; click "add metric" to change to monthly. See also May 2016 announcement.)

More detail regarding use cases for unique devices

Research:Unique_devices/Use_Cases

Other implementations to get data for unique devices

Research:Unique_devices/Other_Possible_Implementations

See also

Notes

  1. Which we are not doing, because it would be a tremendous ethical breach