Talk:IP Editing: Privacy Enhancement and Abuse Mitigation/Archives/2023-04

From Meta, a Wikimedia project coordination wiki

Abuse filter behaviors on temporary accounts

I've got these questions related to abuse filters' interaction with temporary accounts:

  1. Will unregistered users get their temporary accounts when they trigger abusefilters but not actually make edits to pages?
  2. Will users have access to IP addresses of temporary accounts be able to check IPs of unregistered users who trigger abusefilters?

The current IP Info tool is not available if the IP has no edits, while deleted edits and abuse logs are not included. If we don't even have access to their IP in these cases, our anti-vandalism practice will have to change. Tiger (talk) 13:57, 9 April 2023 (UTC)

Tracked in Phabricator:
Task T334623
Today is my lucky day, because yours is the second fascinating question I've seen so far. (The first was about Special:Contact/stewards, which collects IP addresses so the Stewards can unblock them.)
I understand that a username gets "reserved" earlier in the editing process. (As a side effect of this reservation system, we won't actually see completely sequential numbers in usernames in the logs; some usernames will be lost due to abandoned edits.) I don't know if the team has talked about whether the account is officially "created" just before or just after Special:AbuseFilter runs, but there would be a username available in theory to put in the logs.
I've filed two tasks in Phab for this, and will ask the product manager about it. Whatamidoing (WMF) (talk) 20:40, 12 April 2023 (UTC)

Cross-wiki contributions

Hello! Will temporary accounts be global across Wikimedia projects or will they be per-project? Will we be able to continue using tools like https://xtools.wmflabs.org/globalcontribs or https://guc.toolforge.org/? Thank you. MarioGom (talk) 07:51, 7 April 2023 (UTC)

@MarioGom, they will be global (eventually). Whether specific tools will automatically just work depends upon the exact way that they were coded, but if they break, there's no reason to expect these tools to be impossible to fix (or even difficult – it should just be a matter of changing the parts of the code that currently say "IP editor" so that they instead say "temporary user"). Whatamidoing (WMF) (talk) 16:29, 12 April 2023 (UTC)
1. "Temporary user" or unique hash? The first one is absolutely not suitable. Tools like Huggle should still be able to easily identify a series of edits with one author.
2. Will API be prepared for changes before (!) their made? For example, will all methods return a hash instead of an IP? Will it be possible to use rollback (see user parameter)?
3. Will streams (EventStreams, RCFeed, IRC) be prepared (hash, etc) for changes before their made?
4. Does the Foundation take responsibility for fixing code of the most important tools? (guc, etc) Before start of IP masking or in a short time after.
Each of these points is very important. It should be solved before, not after. Iluvatar (talk) 12:14, 25 April 2023 (UTC)
@Iluvatar, I'm not sure what distinction you are making between "temporary user" and "unique hash". Today, all of an editor's contribution will be attributed in the page history to 127.0.0.1 (until the IP address changes). In the future, all of the editor's contributions will be attributed in the page history to User:12345 (until the cookie expires). Whatamidoing (WMF) (talk) 17:00, 25 April 2023 (UTC)
@Whatamidoing (WMF): My understanding is (reading your reply to @MarioGom above) that the name of the temporary account is determined per wiki and session, therefore TempUser123@enwiki is not TempUser123@eswiki, although eventually (when?) the identificator may be global. Is that correct? If so, how can we expect tools such as guc, xtools or stewardry to work as of now? Is the WMF aware of the massive havoc this is going to cause for crosswiki anti-abuse efforts, with lots of tools having already significant technical debt due to lack of maintainers? Can we please stop this or not roll IP Masking unless MediaWiki can guarantee temp. account global IDs from the start? Thank you, —MarcoAurelio (talk) 11:53, 27 April 2023 (UTC)
@MarcoAurelio, "TempUser123" will eventually be global, and there will never be a TempUser123@enwiki that is different from TempUser123@eswiki.
The team doesn't want to switch every wiki on the same day. It'd be better to start with one small wiki (I plan to ask my friends at htwiki to be the first, unless someone wants to recommend a different one) and see what breaks there, and fix that, than to break all the wikis at the same time. Once everything's working okay at the first wiki, it can be added to a second one, and the team can make sure that cross-wiki is working before adding it to the next one.
During the rollout process, if you imagine someone who edits both htwiki and enwiki, that person would temporarily be "TempUser123" at htwiki and 127.0.0.1 at enwiki. If you're starting with "TempUser123", then you could easily reveal the IP and check it separately. If you're starting with the IP, then I'm not certain that it will be easy to find "TempUser123" (not certain = "I don't know either way", not "I expect it to be bad").
I don't know how long the rollout process will take. If everything went perfectly, I suppose that it could be a couple of months. However, I assume that something will break and there will be a multi-week pause while it gets fixed, and, of course, that might happen several times. Whatamidoing (WMF) (talk) 17:55, 27 April 2023 (UTC)
As long as there are no name collisions, it sounds reasonable to me. I don't expect that the WMF upgrades every possible 3rd party tool. Once this is rolled out in, let's say, testwiki, tooling developers can start upgrading their tools (if they wish). MarioGom (talk) 19:32, 27 April 2023 (UTC)

What should it look like?

Early thoughts: go with the carat^ or the year of "creation". The tilde ~ is used in usernames that were affected by the migration to global usernames, and it would be easy to misinterpret. Exclamation marks are used in communication on several projects to indicate "not" (as in !vote = not a vote). Question marks (?) are used in other ways and could be confusing in certain languages.It's hard to tell the difference between an mdash, an ndash, a hyphen, and a minus sign, and only the last one is commonly seen on most keyboards. It would be hard to communicate correctly which "user" was meant if the hiding function included a character that is hard to accurately put into a message on a talk page or noticeboard. Risker (talk) 03:10, 29 April 2023 (UTC)

Thanks for the feedback, @Risker. It is duly noted. I would also like to hear your feedback, if any, about the rest of the plan. NKohli (WMF) (talk) 12:54, 1 May 2023 (UTC)
I'd recommend special character + year + temporary username. So something like ~2023-12345. The special character should probably be added to the username disallow list once it's reserved for this, to prevent regular users from pretending to be temporary users. –Novem Linguae (talk) 23:24, 2 May 2023 (UTC)
Yes, whatever's chosen should be on the disallow list. Also, given the problems with *, perhaps that should go on the disallow list anyway (assuming it could be limited to disallowing it as the first character in future cases only). Whatamidoing (WMF) (talk) 05:04, 5 May 2023 (UTC)
This was my initial thought as well. Just having the year first would make these temporary usernames practically indistinguishable from a normal username at first glance. I'm also against the use of an exclamation mark or question mark (aside from the encoding issue), as I don't think they stand out as much. Personally, I'd prefer the tilde. ~ Eejit43 (talk) 00:55, 9 May 2023 (UTC)
At the moment, it looks like there are almost 1,000 registered editors whose username begins with ~, and almost 500 registered editors whose username begins with 2022. (I checked last year, because people seem to use the current year, so the number with 2023 at the start is probably an undercount). Whatamidoing (WMF) (talk) 03:06, 13 May 2023 (UTC)
I actually think the ~ is a decent representation of the temporary nature of this. [username]~[wiki] was a temporary measure to make CentralAuth work properly on a global scale. It was meant so users could choose a new username. Otherwise I agree with Risker. If we think about scalability, the year makes sense, because don't eventually want ~14897570843972 as that just looks like a keyboard mash. -- Amanda (she/her) 04:20, 5 May 2023 (UTC)
It's possible to do something sensible with formatting, so that ~14897570843972 becomes something like ~1489-7570-8439-72. Whatamidoing (WMF) (talk) 05:05, 5 May 2023 (UTC)
Whatamidoing, as you will check below, this is exactly what my nightmares are made of. :P In the context of the issue I've raised, this would be a masking of a masking of an IP...
I fully agree on making it stand out as much as possible but I hadn't thought about how large that number may get in the practical sense. All the text and mockups in IP Editing: Privacy Enhancement and Abuse Mitigation give the impression of just having a 4-5 ciphers number most of the time. I've talked about long strings vs short strings in my issue but apparently it will be more like old long strings vs new long strings... — Klein Muçi (talk) 05:42, 9 May 2023 (UTC)
Seconded. Basically entirely agree with Risker and Amanda about year no matter what, just wanted to add that ~ has a historical connotation with the definition of "user," whether as an alias for $HOME on (some) (unix) subshells or how good ol' en:slashdot noted user accounts. ~ Amory (utc) 01:10, 9 May 2023 (UTC)

I'd like to see more details on what the range of IDs might look like. Certainly, with current data, you can estimate the number of anonymous users in a given time period. How many digits would be necessary to represent this many users? And how does the cache expiration impact the count? Also, assume that bots and vandals will always clear their cache, so you could have hundreds of IDs generated from a single IP in short order. This, by itself, is probably something that should be configurable as an abuse filter. Yes, there are legitimate reasons for it to happen (a school classroom behind a firewall, for example). But on a small wiki, it's more likely to be vandalism than legitimate use. Ultimately, where I'm headed with this is that it might be useful to include not just year, but also month and date, something like ~2023-0511-1234 or ~2023~0511~1234 (tildes across rather than hyphens). Additionally, I'd like to see them always have a four-digit block format. For example, ~2023~0511~0001 or ~2023-0511-1234-0001. The consistency will be much easier to recognize. -- Dave Braunschweig (talk) 02:23, 12 May 2023 (UTC)

@Dave Braunschweig, the English Wikipedia gets contributions from about a quarter million unique IP addresses each month. Guessing that they're about half the traffic, and that some (but not most) edit in more than one month, I'd estimate that all the wikis combined are currently seeing something on the order of 10,000,000 unique IP addresses per year.
Whether that number will go up or down under the new scheme is a "known unknown". However, I'd guess that any change would be in the half-to-double range, not in the 10x range, so we should plan for 8 digits per year (plus four for the year, if you want to add that instead of eventually having nine or maybe ten digits in the future). Whatamidoing (WMF) (talk) 03:00, 13 May 2023 (UTC)
I suggest not using ~. It is too close to local usernames converted to SUL, and cases of imports from foreign (non-Wikimedia) wiki projects while giving credit in a similar way.
[], #, @ are also problematic for reasons of similarity to other uses (links, pings style).
; looks pretty good to me, what do you think? There are also relatively few users who start with it. —מקף‎‏ (Hyphen) 09:00, 14 May 2023 (UTC)
; is a character like .,!? that splits sentences, so may not be a great username character. –Novem Linguae (talk) 09:32, 14 May 2023 (UTC)
Mmm, I got it.
Are we convinced that the correct way to approach the problem is through a fixed structure for pseudo-usernames? Maybe the inevitable solution is to create a new namespace?
If our approach right now is a new structure, maybe we should think about something that is not a prefix, but a wrapper, something like <~the-pseudo-usernames-here~> (this is just an illustration of what I meant by "wrapper", not necessarily a good proposal). —מקף‎‏ (Hyphen) 10:24, 14 May 2023 (UTC)
Personally that is what I like about using the tilde, and as AmandaNP said, it emphasizes the temporary nature of the username, which was the purpose of its usage when accounts were migrated to SUL. ~ Eejit43 (talk) 18:51, 14 May 2023 (UTC)

Also, assume that bots and vandals will always clear their cache, so you could have hundreds of IDs generated from a single IP in short order.

Many vandals probably will, but I think it’ll be far from all or even most. And even if they do, hundreds of IDs in a short time isn’t really realistic. Bots should never edit anonymously (as far as I know, the two most popular bot frameworks, Pywikibot and AWB, don’t even allow anonymous editing), so bots shouldn’t be a problem at all. —Tacsipacsi (talk) 10:34, 14 May 2023 (UTC)
I wonder if there are logged-out bots editing Wikidata. (I believe that's acceptable under their policies, though I might be wrong about that.) Whatamidoing (WMF) (talk) 05:00, 16 May 2023 (UTC)
Yeah, I like the idea of including human readable information and separating the numbers by dashes. So year would be nice, but I think month and day would make it somewhat longer than needed. Having dashes and a consistent format would definitely be good, and make it easy for users to transition from "123.456.789.011" to "~2023-0234-5678". Galobtter (talk) 07:45, 20 May 2023 (UTC)
The other thing I like about the year is that it will be helpful later, when accounts have started expiring. In 2027, we'll look at the article history and say "Oh, ~2023-nnnn-nnnn – yeah, there's no point in trying to contact that expired account..." Whatamidoing (WMF) (talk) 00:25, 23 May 2023 (UTC)
Very good point. This is a great argument for incorporating the year somewhere. –Novem Linguae (talk) 03:34, 23 May 2023 (UTC)