IP Editing: Privacy Enhancement and Abuse Mitigation/IP Info feature

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Rough sketch of what IP Info feature might look like once built

This project is a step towards improving on-wiki support for the anti-vandalism task forces on our projects. We realize that a lot of on-wiki anti-vandalism workflows heavily rely on information revealed by IP addresses. This information can inform the way an editor may interact with an unregistered user. At the moment retrieving and understanding this information is not an easy task. Our purpose for this project is to make it easier for our admins, anti-vandal fighters and power users to be able to access information about IP addresses more quickly and easily. We hope this project will be very useful as we move forward on the IP Masking project.

Status updates[edit]

17 Nov, 2020[edit]

The project is currently under backend development as we are looking into sources to pull IP Information from and what kind of information we should display and to whom. We have a mockup ready for your feedback. Would love to hear your feedback on the talk page.

10 May, 2020[edit]

  • We have done an initial technical investigation into this project. Follow along on phab:T248525.
  • We are currently looking into the various services that provide information about IP addresses. Follow along on phab:T251933.

Background[edit]

How is IP address information useful to our communities?[edit]

Anti-vandalism[edit]

See also: Research:Patrolling on Wikipedia/Report

Single-address blocks bar a single IP address from editing the site, or specific pages in the case of partial blocks, for a specified duration. MediaWiki also allows administrators to block IP ranges, which is helpful for dynamic IPs or covering a small range frequently used for vandalism. Administrators are expected to check the coverage of ranges they intend to block in order to assess collateral damage.

Certain types of range or single-IP block are handled differently or tagged with templates depending on the type of address they are. For example, if an IP address engaging in vandalism is registered to an educational institution, administrators take special note and apply templates such as en:w:Template:School block and en:w:Template:Shared IP edu. This is especially important given that educators may assign editing work on Wikipedia as part of the curriculum, and if the institution was previously blocked, the templates provide instructions for contacting the administrators to get around it. Other such templates include en:w:Template:Shared IP address (public), for IP addresses determined to be public. This is not necessarily used for blocks and can be used pre-emptively, to clear up potential confusion at receiving messages not meant for the user or to point to features only available to registered users. These templates are not unique to English Wikipedia, and equivalents can be found on many different projects.

The IP blocking workflow of administrators currently relies on some IP information, usually the registered organization, geographic location, and ASN. This information generally comes from third-party IP information providers, with no standard service and therefore, different degrees of accuracy and reliability. For example, an edit from an IP address registered to a residential ISP should be handled differently to an edit from an IP registered to a government organization.

IP addresses are also used in AbuseFilter in conjunction with other settings and targets to make very specific blocks, so as to minimize disrupting the experience of regular users.

IP information is also used in CheckUser, especially when dealing with cases of alternate account abuse (also known as sockpuppeting). Access to this tool is severely limited since it allows access to potentially-identifying information tied to accounts, which usually do not have their IP addresses exposed.

Anonymity and anonymous editing[edit]

There have not been major or definite studies of the effects of unregistered editing on our projects, though there have been previous attempts. Generally speaking, community research has focused on links between anonymity and vandalism. We do know that fairly large portions of constructive edits are made from unregistered users. A 2013 study on anonymous editor volume and impact noted that about 100,000 anonymous editors made roughly a third of the edits counted in that month. This finding was reinforced by a 2016 study on edit productivity, which showed that unregistered users (there called anonymous editors) "contribute substantially to overall productivity". Anecdotally, other administrators on different projects have also noted that unregistered users can make a substantial and constructive portion of the editor-base.

Practically speaking, while no project has disallowed all unregistered user edits as a matter of course, unregistered users are generally restricted in what types of contributions they can make as compared to registered users. For example, unregistered users cannot start new articles or upload files on most of our projects. Furthermore, unregistered users’ lack of a stable social identity makes it difficult for them to communicate and fully participate in their project’s community in several ways. In other words, because there is no way to guarantee that the person behind a given IP address will be the same every time, communication with unregistered users comes with in-built obstacles.

Research[edit]

Research on Wikipedia sometimes uses IP addresses, as exposed on edit summaries, to gain aggregated information about the editing practices of users in a given geographic area. Researchers generally only use aggregate information from IPs.

The problem[edit]

Currently when our editors want to learn about an IP address information, they sometimes need to refer to external, proprietary websites to gain this information. Often they need to consult more than one website to cross-check the data or to get all the different pieces of information they need in order to do their work. This means often an editor would spend a great deal of time and energy looking up the data they want to see. We heard about these issues in great depth when we asked users about their workflows on the project talk page.

Proposed solution[edit]

The core idea is to incorporate this data into the Wikimedia wikis in a way that we can provide all the information an editor needs in-house without them needing to go to external websites to get the information. This would include surfacing information like:

  • High-level location information about an IP address
  • Owner of the IP address
  • Whether the IP address is known to be behind a proxy or Tor node
  • Whether the IP address is considered malicious by other websites

Mockup[edit]

Here's a tentative mockup for the feature. We are currently planning to place the information box containing IP address information on the Contributions page of the IP address. We are also planning to break down the information that's visible to the users based on their permissions. All autoconfirmed users and above would be able to access this This would mean more sensitive information would be accessible to users with advanced permissions like Admins, Checkusers etc.
IP Info mockup.png
As you look at this mockup, I'd invite you to think about the following:

  • When do you seek more information about IP addresses?
  • What information is important for you to know?
  • Where do you need to see this information?
  • How do you use this information? What actions do you take based on this information?

Please leave your thoughts on the talk page. It will be very valuable as we plan our work.

Benefits and risks[edit]

Benefits[edit]

  • Easier patrolling: This would eliminate the need for users to copy-paste IP addresses to external tools and to extract the information they need, leading to lesser manual work.
  • Faster patrolling: It will save editors’ time by giving them the information they need readily in the interface.
  • Higher reliability: The WMF can contract with websites that offer highly reliable datasets which are regularly updated with translations as well. Since this project will be Foundation-maintained, it will probably be much more reliable than some websites our users are dependent on currently.
  • Lower technical barriers: It would make it easier for new admins and checkusers to join without needing to have a very good understanding of how to extract information from IP addresses. This would potentially lead to more minority users in power-roles over the long term.

Risks[edit]

  • Privacy risk: Not everyone on the internet is aware of what an IP address string reveals. This means often unregistered users make edits without knowing they are leaving a fingerprint that can be used to track them. Similarly, a lot of editors do not know this either. This leads to unintentional privacy for unregistered users (Security through obscurity). Depending on who gets to see the information exposed by this feature, there is a real risk of more users seeing the data than before.

Open questions[edit]

  • TBD

Design and implementation[edit]

TBD