Edge Uniques
The Wikimedia Foundation will implement a first-party cookie named wmf-uniq to enable A/B testing and add denial of service attacks (DDoS attacks) protections while preserving user privacy. This solution will process identifiers only at our private CDN edge servers and will not create user profiles of individual readers' browsing histories or patterns that could be linked to a specific person over time.
We will use this cookie to help us better understand how many people visit the Wikimedia sites, how to make our wikis work better for our readers and communities through A/B testing, and improve our ability to protect against denial of service attacks by helping staff identify legitimate human traffic. The Wikimedia Foundation will implement this cookie in a privacy-preserving way, designed to collect the minimal amount of data possible and ensure all uses of the cookie meet or exceed the privacy requirements outlined in our privacy policy and cookie use guidelines. Visitors to Wikimedia websites will be able to clear the cookie at any time.
Key points about the cookie
[edit]- The Wikimedia Foundation will add a first-party browser cookie (wmf-uniq) to Wikipedia and other Wikimedia sites for A/B testing, visitor trend analysis, and DDoS protection, with reader privacy preservation as a key design goal.
- The wmf-uniq cookies will only be processed at our private CDN edge servers, which are the first point of contact for all internet traffic visiting our sites. They will never be stored in traffic logs or databases.
- Readers can block or clear these cookies without affecting their reading or editing experience.
Background
[edit]As the internet changes rapidly, Wikimedia Foundation staff have explored new approaches to usability testing, analytics improvements, and protection against distributed denial of services in order to fulfill our mission. Our sites receive billions of requests per day from readers all over the world in over three hundred languages. This creates some unique challenges for both understanding our visitors’ behavior and protecting ourselves from attacks that flood our websites with bad traffic. To address these challenges, we have designed a first-party cookie (named wmf-uniq), and a process that reads, verifies, and discards a copy of the cookie "at the edge" of our computing systems – meaning the first point where visitor traffic enters our network. This process minimizes the time this uniquely identifying information about a logged-out user will be present in our system to seconds and typically milliseconds. Because we are not storing this cookie in our traffic logs or databases, we cannot create user profiles that could be used to track readers' usage behavior over time. This solution will provide a standardized, privacy-preserving framework that staff and volunteer developers can use when implementing new features, bots, tools, and gadgets that require continuity or analytics.
Our challenges and solutions
[edit]The Edge Uniques solution helps us in the following ways:
1. Improve user experience through A/B testing
[edit]Wikimedia editors work hard to create what is arguably the most important educational resource of our time. We must design experiences that present information effectively to many different kinds of readers. When we identify promising opportunities to improve the reader experience, we want to use controlled experiments known as A/B tests to evaluate our ideas. These usability tests are designed to expose one group of readers to a modified version of an experience while a control group continues using the unchanged version. This controlled experiment allows us to measure precisely how specific changes affect reader behavior and their overall experience.
Consider an example feature idea: showing recommendations for more articles to read. We need to verify these recommendations work effectively for different readers – from desktop to mobile users, across languages including right-to-left scripts – while making sure they make the reading experience better rather than worse.
We have considered other approaches for A/B testing, but have found that they are not sufficiently safe and accurate for our needs:
- Experiments based on IP addresses proved unreliable with today’s internet usage patterns, as readers often switch between mobile data, office wifi, and home networks.
- JavaScript session IDs couldn't support "first-paint" experiments that run as soon as the webpage starts loading. They are also not as privacy-preserving as we need.
In the absence of A/B testing, we have been using approaches that are less informative than controlled experiments: gradual rollouts spanning multiple months which would produce unreliable data due to uncontrollable external factors, or small-scale user studies that failed to represent our global readership.
The Edge Uniques solution will solve these problems by allowing us to create unique "experiment enrollment" identifiers that exist only for one experiment and do not contain any personally identifying or user profiling information.
2. Enable protection against DDoS attacks
[edit]When we face distributed denial-of-service (DDoS) attacks that threaten to make Wikipedia unavailable, our servers are flooded with requests from bots that attempt to overload the system so much that readers can’t access the website. In order to combat these attacks, the most readily available identifier we can work with today is IP addresses. We often limit or block IP addresses involved in attacks, but this creates problems because the same IP address can be shared by many users: university campus networks, public wifi, mobile networks where a lot of users share the same IP, and similar scenarios. This makes it difficult to limit attacks while avoiding impact on real users.
The Edge Uniques solution will improve this by providing a more reliable way to identify legitimate visitors than an IP address alone. The first-party cookies will include historical information about how frequently a browser has visited our sites over time, but not what pages were visited. This history is difficult for bot attackers to fake, as it requires consistent interaction with our sites over time. This allows us to distinguish between genuine readers and sudden attack traffic, even when they come from the same IP address or network. During an attack, this additional context will help us maintain site availability while minimizing disruptions to real humans.
3. Understanding our visitor trends
[edit]In order to plan how to improve Wikimedia products, we need to accurately count how many visits our wikis receive on different types of devices and in different geographies (among other dimensions).
We currently use heuristic methods based on date-stamped cookies (last access solution) to estimate the number of unique devices that visit Wikimedia wikis. With Edge Uniques, we will be able to count readership more precisely without recording any raw unique identifiers, which could lead back to individual readers. We will be able to use methods based on constructing mergeable HyperLogLog sketches for periods of time, allowing us to better understand visitor patterns while maintaining user privacy.
How Edge Uniques will work
[edit]
Note: This is a general explanation for a wider audience. You can also read a detailed technical description of how this works.
When a reader visits any Wikimedia project, their browser will receive a secure, first-party wmf-uniq cookie. This identifier will be processed only by the Wikimedia private CDN edge servers – the same servers that already handle all reader connections. These servers will never store the raw identifier or forward it to other systems. The design makes use of a stateless protocol to make sure that even at the edge, the cookie values including their originating IP addresses will never be stored in traffic logs or databases.
Privacy by design
[edit]Data minimization is a principle used by Wikimedia developers, both volunteer and Wikimedia Foundation staff, to guide our work on Wikipedia and all Wikimedia projects. We publish our policies related to data collection, which support our aim to minimize data collection.
Edge Uniques will apply these principles in a deliberately privacy-preserving design that goes beyond standard industry practices. After a detailed design phase, we have established an approach to privacy protection which includes:

Edge processing only
- Identifiers are processed only by our private CDN edge servers.
- Raw identifiers are removed by our private CDN edge servers before the request is forwarded to any other system.
- No raw identifiers are stored in traffic logs, databases, or any system.
- Processing happens in seconds, not days or months.
Strict separation
- Cookies are secure and first-party only.
- JavaScript cannot access these cookies.
- No persistent user profiles are created.
Minimal data collection
- We collect only specific data necessary for A/B tests.
- A/B tests include only a small percentage of readers.
- We record only interactions relevant to what we're testing.
- Each test uses a different derivation of the wmf-uniq cookie.
- Users can clear cookies or use private browsing that blocks cookies at any time.
First use case: A/B testing
[edit]The first applied use case for the wmf-uniq cookie will be to enable reliable A/B testing that helps us improve the Wikipedia reader experience as part of our FY 24/25 Annual Plan. By running these experiments across our wide base of users, we can measure how new features in our reading experience affect crucial indicators such as reader retention, content growth, accessibility, and overall user engagement. This experimentation process also supports the Wikimedia movement’s commitment to evidence-based decision-making.
For A/B testing, a small percentage of readers (typically 1–5%) will be randomly selected to participate in an experiment. Once a reader is placed in an experiment, the edge server will derive an "experiment enrollment" identifier using a one-way function. These new derivative "experiment enrollment" identifiers will allow us to know that an action was taken and that it was done by a unique reader, but we will not know who the reader was since no personally identifying information will be recorded. Every A/B test will use a different derivation of the wmf-uniq cookie, so it will not be possible to correlate user behaviors between different tests or create a user profile.
Because we don't want to collect anything that isn't necessary, we will only collect the information related to what we are testing. We will not track everything a reader does during an experiment, only how they react to the features that we will test. We’ll be able to assess questions like:
- Did readers successfully use the new feature?
- Did they return to our wikis within seven days?
- When they returned, did they successfully use the new feature again?
This sort of A/B testing will be a rigorous tool in our existing suite of evaluation methods, including focus groups, mockup discussions, prototype feedback, and small-group testing. Together, these approaches ensure our software changes deliver meaningful improvements to the user experience.
Next steps
[edit]- Now that we have completed the technical design phase, we are beginning implementation, with code development viewable in a public repository. You can follow progress on the work in Phabricator. This cookie is not yet active on Wikimedia sites.
- We will set up community calls (currently aiming for April 29, more information to come on this page) for anyone who wants to talk to us to learn more about this project. You can also discuss this work with us on the talk page.