IP Editing: Privacy Enhancement and Abuse Mitigation
Note: This project proposal is in early draft stage. Your comments are very welcome on the talk page.
The world is increasingly technically advanced and privacy conscious. Users are more aware than ever of the collection and use of their personal information, and how its misuse may lead to harassment or abuse. Many websites, including the Wikimedia projects, are continually working to re-evaluate and enhance protections for user privacy. As part of this effort, the Wikimedia Foundation is embarking upon a technical improvement to the projects, and we’d like your input.
MediaWiki stores and publishes the IP addresses of unregistered contributors (as part of their signature; in page history and in logs), visible to anyone visiting our sites. Publication of these IP addresses risks compromising the safety and anonymity of these users and in some cases may even invite the danger of people being at risk of government persecution. It ought to be possible to provide increased privacy protection for unregistered contributors by obscuring their IP addresses when they contribute to the projects.
Inarguably, Wikimedia projects have a very good reason for storing and publishing IP addresses: they play a critical role in keeping vandalism and harassment off our wikis. It's very important that contributors, admins and functionaries have tools that can identify and block vandals, sockpuppets, editors with conflicts of interest and other bad actors.
The Wikimedia Foundation believes that, working with checkusers, stewards and vandal-fighters, it's possible to figure out a way to protect our users’ privacy while keeping our anti-vandalism tools working at-par with how they work now. Hence, it has decided to work on shielding IP addresses from our wikis — including restricting the number of people who can see other users' IP addresses, and reducing the amount of time IP addresses are stored in our databases and logs. It is important to note that a critical part of this work will be to ensure that our wikis still have access to the same (or better) level of anti-vandalism tooling and are not at risk of facing abuse.
Wikimedia Foundation does not currently have any definite plans for how to achieve this dual goal of better protecting user privacy, while also giving contributors effective tools to protect our wikis from vandalism and abuse. It is vital that developers work in partnership with checkusers, stewards and other contributors to identify the necessary changes to Wikimedia projects' tools and procedures. While there were some discussions about the technical and security implications of this project, the Wikimedia Foundation is waiting to hear from Wikimedia communities about the social implications before it decides on the next steps for this project.
This is a very challenging problem and that’s why it's been put off over the years. But in light of evolving data-privacy standards on the internet, the Wikimedia Foundation thinks it's now time to tackle this problem.
An IP address can be used to find out a user’s geographical location and institution and other personally identifiable information, depending on how the IP address was assigned and by whom. This can sometimes mean that an IP address can be used to pinpoint exactly who made an edit and from where, especially when the editor pool is small in a geographic area.
Concerns around exposing IP addresses on our projects have been brought repeatedly by our communities and the Wikimedia movement as a whole has been talking about how to solve this problem for at least fifteen years. Here’s a (non-exhaustive) list of some of the previous discussions that have happened around this topic:
- Mailing list threads:
- Bugtracker discussions:
See also: Research:IP masking impact report.
IP addresses are valuable as a semi-reliable partial identifier, which is not easily manipulated by their associated user. Depending on provider and device configuration, IP address information is not always accurate or precise, and deep technical knowledge and fluency is needed to make best use of IP address information, though administrators are not currently required to demonstrate such fluency to have access. This technical information is used to support additional information (referred to as “behavioural knowledge”) where possible, and the information taken from IP addresses significantly impact the course of administrative action taken.
On the social side, the issue of whether to allow unregistered users to edit has been a subject of extensive debate. So far, it has erred on the side of allowing unregistered users to edit. The debate is generally framed around a desire to halt vandalism, versus preserving the ability for pseudo-anonymous editing and lowering the barrier to edit. There is a perception of bias against unregistered users because of their association with vandalism, which also appears as algorithmic bias in tools such as ORES. Additionally, there are major communications issues when trying to talk to unregistered users, largely due to lack of notifications, and because there is no guarantee that the same person will be reading the messages sent to that IP talk page.
In terms of the potential impact of IP masking, it will significantly impact administrator workflows and may increase the burden on CheckUsers in the short term. If or when IP addresses are masked, we should expect our administrators' ability to manage vandalism to be greatly hindered. This can be mitigated by providing tools with equivalent or greater functionality, but we should expect a transitional period marked by reduced administrator efficacy. In order to provide proper tool support for our administrators’ work, we must be careful to preserve or provide alternatives to the following functions currently fulfilled by IP information:
- Block efficacy and collateral estimation
- Some way of surfacing similarities or patterns among unregistered users, such as geographic similarity, certain institutions (e.g. if edits are coming from a high school or university)
- The ability to target specific groups of unregistered users, such as vandals jumping IPs within a specific range
- Location or institution-specific actions (not necessarily blocks); for example, the ability to determine if edits are made from an open proxy, or public location like a school or public library.
Depending on how we handle temporary accounts or identifiers for unregistered users, we may be able to improve communication to unregistered users. Underlying discussions and concerns around unregistered editing, anonymous vandalism, and bias against unregistered users are unlikely to significantly change if we mask IPs, provided we maintain the ability to edit projects while logged out.
This project is currently in very early phases of discussions and we don’t have a concrete plan for it yet. We hope to have a lot of active community involvement as we brainstorm different ideas for what would work best.
The general idea is that edits will be recorded using an automatically-generated, unique, human-readable identifier instead of the IP address when an edit is made by an unregistered user. This identifier will stay consistent over a session and possibly longer, depending on implementation details. We will keep a mapping of the identifier to the IP address in the database for a limited time period and surface it as required for our functionaries.
Q: Will users with advanced permissions such as CheckUsers, Admins, Stewards still have access to IP addresses after this project is complete?
- A: We don’t yet have a definitive answer to this question. Ideally, IP addresses should be exposed to as few people as possible (including WMF staff). We hope to restrict IP address exposure to only those users who need to see it.
Q: How would anti-vandalism tools work without IP addresses?
- A: There are some potential ideas for achieving this goal. For one, we may be able to surface other pertinent information about the user instead of the IP to the functionaries that provide the same amount of information. In addition, it may be possible to automatically verify if two separate user accounts link to the same IP, without exposing the IP - in cases of sockpuppet investigations. It’s also possible that anti-vandalism tools will continue to use IP addresses, but will have restricted access. We will need to work closely with the community to find the optimal solutions.
Q: If we don’t see IP addresses, what would we see instead when edits are made by unregistered users?
- A: Instead of IP addresses, users will be able to see a unique, automatically-generated, human-readable username. This can look something like “Anonymous 12345”, for example.
Q: Will a new username be generated for every unregistered edit?
- A: No. We intend to implement some method to make the generated usernames at least partially persistent, for example, by associating them with a cookie, the user’s IP address, or both.
Q: Will you also be removing existing IP addresses from the wikis as part of this project?
- A: We will not be hiding any existing IP addresses in history, logs or signatures for this project. It will only affect future edits made after this project has been launched.
Q: Is this project the result of a particular law being passed?
- A: No. Data privacy standards are evolving in many countries and regions around the world, along with user expectations. We have always worked hard to protect user privacy, and we continue to learn from and apply best practices based on new standards and expectations. This project is the next step in our own evolution.
Q: What is the timeline on this project?
- A: As mentioned above, we will not be making any firm decisions about this project until we have gathered input from the communities. We'd like to figure out sensible early steps that a development team could work on soon, so we can get started on what we think will be a long project, but we're not hurrying to meet a particular deadline.
Q: How do I get involved?
- A: We would love to hear if you have ideas or feedback about the project! We would especially like to hear if you have any workflows or processes that might be impacted by this project. You can drop your thoughts on the talk page or fill out this form and we’ll reach out to you. Some of us will be at Wikimania and would love to meet you there as well.