Talk:IP Editing: Privacy Enhancement and Abuse Mitigation/Archives/2019-09

From Meta, a Wikimedia project coordination wiki

Who would benefit that?

As I've mentioned in the cswiki discussion on this proposal, there are only three significant groups of users, who would benefit IP masking. These are:

  1. registred users, failing or forgetting to log in
  2. little contributors unwilling to register
  3. IP vandals

I don't see any reason to encourage any user from these groups to continue contributing anonymously. The first group can be handled through oversight and should be more aware of logging in, users from the second group are editing Wikipedia on their own risk (knowing that their IP would appear) and the third group shouldn't edit Wikimedia projects at all.

— Draceane talkcontrib. 13:59, 3 September 2019 (UTC)

About the oversight thing, the wmf are hoping this will cut down on that because 95% of their oversight requests are this kind of thing. The second one, some people have banner blindness and I suggest solving that with a tickbox, (i.e. something like "I acknowledge pressing Submit will showmy IP"). Agee about the third one, idk why the wmf wants to spread their ass wide to vandals but I don't like it. Computer Fizz (talk) 03:12, 4 September 2019 (UTC)
A tickbox would actually solve both the 1st and the 2nd groups. By WMF, who don't directly handle oversighting, do you mean the Stewards, or the WMF just wanting to need fewer people with OS rights? Nosebagbear (talk) 08:31, 4 September 2019 (UTC)
I mean, they said it'll cut down on that which you can hope without having oversight. It could either mean the local oversighters (for large wikis) or the stewards. Computer Fizz (talk) 16:56, 4 September 2019 (UTC)
@Computer Fizz:. Are you saying this will cut down on the anti-vandalism that can be done without the oversight and/or checkuser power? If so, I agree, but it doesn't really seem clear what you are saying. — Arthur Rubin T C (en: U, T) 00:30, 5 September 2019 (UTC)

Note from Legal

Hi all. I’ve seen some of the discussion on this page focus on the impact of publishing IP addresses of unregistered contributors on user privacy, so I’m providing some additional thoughts from the Foundation Legal team’s perspective.

Part of our job is to track regulatory, policy, and societal trends that impact the projects, movement, and Wikimedia users. We’ve watched as people around the world have started expressing more general concern about online privacy and safety, and we receive questions every day about privacy and our data handling practices — including questions about editor IP addresses on wiki pages. I’m excited that we’re looking at our use of IP addresses with fresh eyes, and working with the community to consider new approaches to enhance the privacy of unregistered users.

Our commitment to user privacy includes continuously reevaluating and improving our privacy practices. Reducing public use of IP addresses would be a significant step towards improving our protection of unregistered users. Of course, we are also committed to providing contributors with the tools to fight harassment and vandalism. We want to find a way to do both — well. As the Anti-Harassment Tools team continues this conversation and starts designing and refining their ideas, the Legal team will act as a sounding board, and share our perspective — including here on-wiki (as we are able). Thanks to everyone who has contributed to this important discussion!

--TSebro (WMF) (talk) 23:47, 4 September 2019 (UTC)

Morning coffee commentary

There have been a few comments to the effect of "well we'll just stop fighting vandalism then". It doesn't seem particularly likely that that's liable to happen. What seems more likely is that if this is implemented poorly, it's liable to embolden a popular movement to ban anonymous editing entirely. Now, the Foundation might take exception to this, but apply a little bit of common sense here folks. We don't really need the Foundation's approval; we can already do this with existing functionality. What that looks like is dramatically lowering the bar for semi-protection across projects, and indefinitely semi-protecting massive swaths of pages.

So lets be clear on exactly what's at stake here. I don't think any of us want to be the trigger-man in semi-protecting entire projects, but we can do it. The only thing that's required here is a local consensus that any vandalism whatsoever warrants indef semi, and if you take away our scalpels and leave us only with hammers, then hammers it is, and a whole lot of problems are going to start looking like nails.

Lest someone try to paint me as a singularly focused vocal vandal fighter, if anybody wants to have a cross-wiki pissing contest about content creation then I'll step right up and we can compare resumes. So please take it seriously when I say on behalf of our content creators, that we'll be damned before we'll pour a hundred hours into a product only to see it smashed by some kid with a smart phone. GMGtalk 13:53, 8 September 2019 (UTC)

It doesn't take a lot of imagination to revert to pre-1996 ways of doing things. Nemo 19:02, 10 September 2019 (UTC)

Consider the effect on researchers using Wikimedia data/metadata

Although not included in NKohli (WMF)'s detailed half-time summary, a number of people have pointed to the potential impact this proposal might have on people outside of the Wikimedia community who have relied on public IP data and/or the specific way it is published and who have created things of value. Many of the examples I've seen have been things like w:United States Congressional staff edits to Wikipedia.

I wanted to urge the folks involved to carefully consider the impact of any change on researchers who rely on Wikimedia data. Although it's not always visible to participants, Wikimedia projects serve as the single most important laboratory for social scientific, computing, and informatics research in the world. There are literally thousands of papers published about Wikipedia and that use Wikipedia data. A major change to the way that contributions are attributed will likely affect many external reseachers' abilities ability to learn with Wikipedia and compare data collected before and after any change. It could make it difficult or impossible to replicate previously published studies in the future.

Without a concrete proposal, it's hard to know what the impact would be. Based on some of the suggestions floated in the proposal, I can easily imagine that change could meant that:

  • researchers become limited in their ability compare numbers of non-registered contributors made before/after the proposal is implemented;
  • researchers cannot allocate/attribute contributions to individuals/users in ways that are consistent or clearly explainable;
  • researchers cannot study geographic concentration of contributions (e.g., urban/rural divides; global inequality in participation, etc).
  • ...and so on.

I believe that these examples, and countless others we've not imagined yet, represent real costs to broader value that Wikipedia provides to the world through its utility as a source of research data. To be clear, I'm absolutely not categorically opposed to incurring these costs. Protecting contributor privacy and mitigating/reducing harassment is clearly very important too! It is clear from the discussion on this page that seeing any version of this proposal through is going to be an exercise is making difficult tradeoffs. I only want to make sure the team designing this system know what the researchers outside the foundation will be giving up.

I know that the WMF team shepherding this proposal has been in touch with people from the WMF research team. From what I've seen on this page, this has so far been focused on identifying research that would inform this proposal rather than on understanding the effect of the proposal on future research.

My suggestion here is for the folks in WMF working on this to connect with LZia (WMF) and others on her team to insure that any proposal is informed by a solid sense of what the effects will be on future research both inside and outside WMF. Maybe consider running proposals by the wiki-research-l] list? I'm happy to help out. —mako 23:56, 10 September 2019 (UTC)

@Benjamin Mako Hill: Thanks for looping me in. I had a chat with part of our team (Research) about it as well as NKohli (WMF), and I understand you had a separate conversation as well. The assessment of our team aligns with yours in that we believe it's very important to formalize/acknowledge the impact on the research community especially in light of the fact that the data has been used extensively used by them. (I'm also with you that the point is to make sure the effect is considered in any decision making.). I've offered our team's support to NKohli (WMF) et al. if they have questions about this part of the assessment. --LZia (WMF) (talk) 21:39, 12 September 2019 (UTC)

How is this going to affect people editing from Shared IP addresses?

Take for example the corporate proxy for Advance Auto Parts, seen here. There's a mixed bag of good edits and stupidity, probably from employees at stores all over the United States connecting through one IP address at corporate. Would all of these edits go through the same unique identifier? Currently we have templates like Shared IP corp and Shared IP edu to remind people not to bite the head off of newbies from these IPs over things that other people have done from the same IP (not that anyone on English Wikipedia pays any attention to that; the edu one has become more of a "block this IP longer" tag to admins), what happens when we have no way of knowing what the IP belongs to? Thinking about it, it might be a good thing for admins to no longer know when editors are coming from K-12 schools, both for the contributor's privacy and for Wikimedia's sake to eliminate bias, but will the editing community know to apply common sense that these unique identifiers might represent thousands of people rather than just one person? As a side note to those unfamiliar with me (which is probably most because I'm nothing special), I'm not in high school anymore; the user name comes from my alma mater where I was over ten years ago when I became part of the Wikimedia community. PCHS-NJROTC (talk) 13:35, 9 September 2019 (UTC)

@PCHS-NJROTC: We are thinking about having a way to surface information about the underlying IP (such as location, organization/school etc) without exposing the IP address. That will provide better privacy to unregistered editors than we have right now. We are still thinking about how this would be implemented and what information will be surfaced and how. Thanks for your question! -- NKohli (WMF) (talk) 23:49, 11 September 2019 (UTC)
@NKohli (WMF): With all due respect, what is even the point of this if you're just going to tell people what school someone is editing from? That would actually reduce privacy by making it easier for people with less technical knowledge to know where our editors work, go to school, etc, because people who have absolutely no knowledge of IP addresses or how to run a WHOIS query would then be able to see "Anonymous User from Charlotte County Public Schools" (for example) wrote something on Wikipedia, making it even easier for our contributors to be doxxed or harassed for their actions. I mean, imagine the scenario where a middle school student contributes negative but factual and well referenced information on the article of a large corporation and some company exec who knows little about computers suddenly sees that and pays someone to identify that kid and "make him pay" for it. That's already theoretically a possibility with IPs being exposed, and we already make it easier for that to happen by tagging schools with Shared IP edu, but we don't need to make it even easier by having our editors' schools, workplaces, or geographic locations appear in the edit history like that. I don't like this idea at all. PCHS-NJROTC (talk) 13:40, 12 September 2019 (UTC)
@PCHS-NJROTC: We will probably not be exposing the name of the school but rather make it clear that it is a school and provide some sense of location. Please do keep in mind that these are just ideas at the moment and depending on what is feasible from a technical point of view might change things. We are also probably not going to be recording the school/workplace/location identifiers in the edit history but rather have a way for users to be able to see that if they need to. It doesn't take a lot of technical skill to do a whois, with the huge number of tools available on the internet now. You could put in an IP address in a search engine and it will give you a lot of information. There are a lot of things still to be figured out. But trust me, we are not going to make it easier for users to be harassed. Thank you. -- NKohli (WMF) (talk) 22:25, 16 September 2019 (UTC)
@NKohli (WMF): If you are going to do this, I think you should seize the opportunity to eliminate an on-going bias at least on the English Wikipedia by NOT identifying the anonymous users as being schools. I hate to be blunt, but I have backed off of participating in recent changes patrol because of the outright stupidity I would observe from other vandal patrolmen/women when dealing with shared IPs, including schools and places that are not schools like big corporations and federal government agencies. The philosophy of an open project where anyone can contribute to human knowledge would benefit if admins were not able to identify educational institutions (or places they incorrectly think are educational institutions) and block them with the bias that educational institutions including universities are nothing but a source of vandalism. Thoughts? PCHS-NJROTC (talk) 01:17, 24 September 2019 (UTC)

Redact the IP addresses if it's an issue

I feel like this is a solution in search of a problem; I don't think unregistered users are particularly worried about their IP addresses being public information, or else they would have said something in the past 17 years or so we haven't redacted IP addresses. If it isn't, however, might I suggest we go back to the old UseModWiki/Phase II days where IP addresses were redacted, like Users 216.7.146.xxx or 24.120.22.xxx; the full IP addresses could be left to CheckUsers.

Or just use the phrase "Anonymous Coward." (Or bring back domain names from the UseMod era. :P) John M Wolfson (talk) 20:33, 25 September 2019 (UTC)

So this isn't an issue for say someone editing English Wikipedia from the US, but if someone edits a specific subject in Nynorsk Norwegian from a redacted IP address that indicates they're probably living in Estonia, the addition of the geographical information coming from even part of the IP address could be enough to identify them personally, yet as a patroller I don't get the specific identification that a masked ID that's as unique as the IP would give. What would the benefits be of doing this instead – am I missing something? /Johan (WMF) (talk) 08:45, 26 September 2019 (UTC)
I don't quite know how IPv4 addresses work, but I'm sure that having the last chunk of it redacted will prevent it from narrowing down a specific individual. (Also, I'm not sure if Geolocation services work with only a partial IP.) As said above, however, I don't think this is that big a deal. John M Wolfson (talk) 03:41, 27 September 2019 (UTC)

Working Group wants IP masking for everyone but Checkusers, few suggestions how, little consideration of issues and no communication

The Community Health working group has raised the same suggestions - Proposals can be found here. The IP-Masking proposal is number 3 under Q1 (the others don't apply to this).

This is their 2nd set of proposals - they completely failed to communicate with us the first time, putting them well behind NKohli, and it's a far worse set of ideas with little on how they'd do it.

I'd really rather we go there to state the concerns, and I'd appreciate NKohli contacting them more directly to point out the huge number of legitimate issues raised, and why even limiting it to Admins, let alone CUs (as they wish) wouldn't work. Nosebagbear (talk) 13:46, 22 September 2019 (UTC)

Pinging @NKohli (WMF): as I forgot to actually do so above. Also pinging @LZia (WMF): as this would be another group to reach out to. Nosebagbear (talk) 18:07, 22 September 2019 (UTC)
@Nosebagbear: Thanks for pointing this out to me! I had not seen the working group recommendations. I will reach out to them and sync up on the idea. I'm not sure they are aware of this project proposal either. -- NKohli (WMF) (talk) 21:04, 25 September 2019 (UTC)
They said "Admin", but I don't recall signing a non-disclosure agreement as an en.Wikipedia admin, so they meant "Checkuser". — Arthur Rubin T C (en: U, T) 20:51, 22 September 2019 (UTC)
@James Heilman: -- Any comments? Shall we expect any kind of communication from your fellow WG members; I am yet to see anyone other than you and a couple of others across all WGs who have even minimally engaged. Winged Blades of Godric (talk) 11:26, 25 September 2019 (UTC)
User:Winged Blades of Godric you will need to use my username for the ping to work :-) Just seeing this now. With respect to masking IPs, IMO all admins and maybe another new group (so that this can be given to non admins) should be able to see it. We will simple need a simple way for people to agree to the NDAs. Doc James (talk · contribs · email) 06:36, 18 October 2019 (UTC)

Very Good potential

If the alternatives are implemented I think the system will work much more efficiently. I hope things as I understand them have the following potential functionality

  1. CU's could be made publicly between registered and unregistered users, since no personal information will be revealed. A lot of disruption can be handled better this way IMO.
  2. If the generated usernames are more persistent than dynamic IPs, a lot of silly & "unprofessional" vandalism won't happen. "Professionals" will always find the way around.
  3. If a computer based username is generated, school or institutional vandalism won't prevent constructive edits from the same IP. It would be nice for this usernames to have a pattern e.g. XXXXXXX PC-YYYY
  4. Bias against IP edits will decline
  5. IPv6 is more confusing to the human eye than a readable autocreated username, and much harder to unassisted patern recognition, provided that the autocreating username system will assing some kind of ISP patterns.
  6. A faster global contributions tool (as fast as CentralAuth)


I think it's worth a try. —Ah3kal (talk) 02:30, 26 September 2019 (UTC)

@Ah3kal: Your comment makes sense, however, I fail to see how the masked IP info will help us in finding socks, if sock_1 is editing from masked_IP_1 and sock_2 is editing from masked_IP_2 then only the real underlying IPs will show that they are in close IP proximity (say within a /24), which is still invisible. The only positive match may be if sock_2 happens to use masked_IP_1 which we then recognize as maybe being the same editor (just purely based on that info).
But spam coming from masked_IP_1, masked_IP_2, and masked_IP_3 now would make me block the three masks, whereas if I now that the three underlying IPs would be totally unrelated (as now would be possible) I would just plainly blacklist the spam (no need to block 3 complete ranges to stop the spammer). However, if I know that it is just IPs in a /24 spamming, I might consider to just block the range. --Dirk Beetstra T C (en: U, T) 11:52, 26 September 2019 (UTC)
Beetstra It is very much possible to build a tool that tells you if the accounts are editing from a close-range. In fact we could even make it easy to find accounts that are editing from close proximity (IP wise) to a given account. These are things that could be handled by the computer without users having to worry about the IPs and knowing what IPs mean and how they work to understand which ranges to block. -- NKohli (WMF) (talk) 22:25, 4 October 2019 (UTC)
@NKohli (WMF): such tools should have been available for years already. I would even go as far as saying that if WMF would have taken up those problems and have properly connected with what the editing community actually needs we would not have ended up with this totally absurd proposal. WMF is approaching things from the wrong side. (I thought I was commenting on Talk:Office actions/Community consultation on partial and temporary office actions/09 2019‎). You would have gathered much, much more support for this idea if that type of problems were finally solved before suggesting this. —Dirk Beetstra T C (en: U, T) 06:12, 5 October 2019 (UTC)
There are several quite promising ways to identify socks, but it is pretty hard to get anything done because it is an uphill battle against users that fear any form of changes. We can do network timing analysis, we can do route analysis, we can fingerprint the source, we can fingerprint the targets, we can do timing analysis on the edits, and probably a whole bunch of methods I don't know about. And not to forget, the most obvious one, starting to use verified accounts. — Jeblad 11:30, 9 October 2019 (UTC)
To me, something as simple as a AbuseFilter from which only the CheckUsers can see the results that besides the on-wiki fingerprint filters also has capability to check IP ranges that the user is using, and similar data that CheckUSers can see would be great. I even don't know if the CheckUser extension has received a major upgrade over the last 10 years, or that it has been neglected like some other extensions. --Dirk Beetstra T C (en: U, T) 11:58, 9 October 2019 (UTC)
It hasn't seen a lot of development (but we're working on that right now). /Johan (WMF) (talk) 12:00, 9 October 2019 (UTC)
@Johan (WMF): mw:Anti-Harassment Tools?? What about the 10 year old bugs? --Dirk Beetstra T C (en: U, T) 12:42, 9 October 2019 (UTC)
We're looking at generally working on these tools, not just some CU improvements. But as mentioned above, we don't have a plan for exactly what yet, so we've just picked one place to start in order to not let development lie untouched while we figure out the needs and workflows across various wikis. This is not a short project that'll be done by the end of the year. /Johan (WMF) (talk) 13:22, 9 October 2019 (UTC)
Also, we don't want to be secretly working towards a goal without telling people what we're aiming at. /Johan (WMF) (talk) 10:47, 10 October 2019 (UTC)

Basically we want to have a persistent ID associated with a single individual / entity. IP addresses do this poorly as they change / are so easy to change. If this proposal results in a more persistent link between IDs and individuals than we may be better off than we are currently from a vandalism point of view. Doc James (talk · contribs · email) 04:24, 22 October 2019 (UTC)

User:Doc James raises an excellent point in that IP addresses and IP ranges do jack to identify an individual in 2019. I am a CheckUser on another wiki, and it is impossible to stop someone editing from AT&T Mobility, T-Mobile USA, Cellco Partnership DBA Verizon Wireless, and Sprint (insert city name here) POP without causing serious collateral damage, and abuse reports to at least the former two do screw-all due to carrier-grade NATing (T-Mobile actually sent me a personal reply stating that they could not identify a very persistent and nasty troll I complained about due to "network infrastructure"). CheckUser provides the OS, browser, and sometimes device model used by editors, which can be handy for connecting sneaky sockpuppets created by someone with an uncommon phone or OS (that's come in handy twice for me), but this information is useless when an abuser is using Safari on an iPhone 7 or Mozilla Firefox on a Windows 10 PC, for example. Seeing first hand how easy it is to evade blocks and bans with so many networks at the average internet users' disposal, long-term blocks that cause collateral damage cause me to shake my head. Sysops on the English Wikipedia will softblock ranges representing hundreds of thousands of people only for the person they are trying to stop to go to Starbucks, Dunkin Donuts, or even a hospital to continue the carnage. There has to be a better way. PCHS-NJROTC (talk) 02:53, 19 December 2019 (UTC)