A key content consumption metric is unique devices; how many distinct devices we have visiting our web properties in a given time period. Depending on the implementation, this also has implications for how we implement session analysis metrics.
Obviously this raises fairly big privacy concerns. The Analytics team counts unique devices per project per day and month in a way that does not uniquely identify, fingerprint or otherwise track users.
- 1 Definition
- 2 How do we count unique devices?
- 3 Technical details
- 4 Dataset
- 5 More Detail Regarding Use Cases for Unique devices
- 6 Other implementations to get data for unique devices
- 7 See also
- 8 Notes
The number of "unique users" hitting a site is a common high-level metric in web analytics. In our case -given that we do not ask users to log in- we are using the counting of Unique Devices as a proxy to count users.
Differences between unique devices and unique users
Since you do not need to be logged in to use Wikipedia, we are restricted to count uniques using http cookies. This means that if a user uses both his mobile phone and desktop to access Wikipedia, we are counting two "devices" (mobile and desktop). There is no way to count unique users using cookies if we do not ask our users to log in, regardless of the method.
Why the name unique devices
We play with the idea of publishing our numbers under "unique clients" now, the main objection we have to 'unique clients' is that it feels different to IT people and non IT users. Technical people would think of client as in 'client-server' and think mostly of it as a software instance, even one app could create several clients in different threads. But non technical people would think of 'client' as a 'person' e.g. a person visiting a store. That's what makes 'client' so tricky as it means different things to different people. And all nuance gets lots. So if we say "we mean 'client' as 'piece of software', and one can have several clients even on one device"
Our idea publishing these numbers under the 'unique devices' label is that everyone will immediately grasp that one human can have several devices. Even when they probably don't grasp or remember the further breakdown into several software instances per device.
Can we count unique users instead of devices?
The answer is no, regardless of method. We could only do that if we ask everyone to login to use wiki projects. That said, Erik Zachte theorized that if we compute Unique Devices at the hourly granularity, it might be a closer estimate to Unique Users. Because people are unlikely to switch devices to access Wiki projects within the same hour.
Why are we doing this?
We used to depend on ComScore to produce this data, which came with some problems: the methodology for producing the data was not transparent, nor are the results, and it comes with some restrictions around data usage. Additionally, ComScore data is far fuzzier for mobile devices unless we install tracking beacons on our sites that stream data back to ComScore. Developing our own way of tracking this allows for us to be certain about the meaning behind the numbers, track how the approach holds up in practice, and share the high-level numbers not just narrowly within the WMF (as we can now) but with the wider community.
How do we count unique devices?
We use a very privacy-conscious way to count unique devices, it does not include any cookie by which your browser history can be tracked. A simple explanation can be found in our Wikimedia blog post on the topic.
Technical details of the last access cookie implementation are available on wikitech.
With this methodology we can count unique devices, but we cannot use the last access cookie to tag users belonging to different buckets of an A/B test, for example.
Data for Unique Devices in Wikimedia projects is available in the following forms:
In downloadable form here: http://dumps.wikimedia.org/other/unique_devices/
For users with access to Hive in form of two database tables called
last_access_uniques_monthly. See wikitech:Analytics/Data/Unique Devices for the table schema. This also includes per-country data which is not exposed publicly due to privacy concerns.
Data can also be queried programmatically: https://wikitech.wikimedia.org/wiki/Analytics/Unique_Devices#Quick_Start
More Detail Regarding Use Cases for Unique devices
Other implementations to get data for unique devices
- Which we are not doing, because it would be a tremendous ethical breach