Talk:IP Editing: Privacy Enhancement and Abuse Mitigation/CheckUser Improvements

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Feedback about the proposed improvements on the project page[edit]

What do you think of the preliminary check idea?[edit]

  • Seems neat but not too different from the existing Special:CentralAuth. When I want to look up multiple accounts I'm usually more interested in seeing if there's any overlap in technical data. I also generally only care about enwiki, since that's the only place I can take any action.

    What you have here seems better fit for a public-facing special page, maybe Special:CentralAuthMultiple, or something. Mostly of use to stewards, not local CUs. MusikAnimal talk 19:47, 4 October 2019 (UTC)

  • I agree with what MusikAnimal said. -- zzuuzz (talk) 05:37, 8 October 2019 (UTC)
  • @MusikAnimal and Zzuuzz:, the idea here was to provide the information Special:CentralAuth provides but in one view for all of the linked accounts instead of the Checkuser having to do the lookup for each account separately. We can definitely restrict the information to a single wiki (the wiki where the check is being run) and offer an option to view information for all wikis (for global cases). With that in mind, do you think this could be useful? -- NKohli (WMF) (talk) 12:23, 8 October 2019 (UTC)
  • @Ajraddatz: I don't have access to that special page but my understanding is that that interface, though it provides a similar level of information, is for when you need to lock multiple accounts. We are trying to surface all that information in CheckUser so basic information about the accounts can be viewed before deciding whether a check is warranted. Do you think it would help to tie in Special:MultiLock with CheckUser? -- NKohli (WMF) (talk) 19:00, 8 October 2019 (UTC)
  • 100% yes. As it is, when I have a list of accounts to multilock I need to manually add their names into separate CU windows. Any improvement to that workflow would be amazing. (pic of multilock interface) – Ajraddatz (talk) 19:13, 8 October 2019 (UTC)
  • @Ajraddatz: The Get users interface in CheckUser has an option to select multiple accounts and block them all. Is that similar to what you are looking for? Do you use it much? -- NKohli (WMF) (talk) 06:04, 9 October 2019 (UTC)
  • Comparing edit counts and registration dates could be helpful, but in my workflow I usually already know what accounts I want to check (and that there's strong enough evidence to warrant those checks). The hard part is identifying any crossover in technical data. So in my case I'd probably go straight to the second screen that you have listed. MusikAnimal talk 21:10, 8 October 2019 (UTC)
  • From what I read it might be helpful as a safety net to make sure that you are checking the right accounts. --Rschen7754 18:19, 8 October 2019 (UTC)
  • @Rschen7754: do you imagine this being an Are you sure you want to check the following users? kind of step? -- NKohli (WMF) (talk) 19:00, 8 October 2019 (UTC)
  • It could be, as I've heard stories of users accidentally being checked (and even though I rarely used CU when I had access, I managed to mistype an IP range). But I would hesitate to make it mandatory for everyone. --Rschen7754 04:42, 9 October 2019 (UTC)
  • The preliminary check page should be part of Special:CentralAuth (i.e. being allowed to supply multiple users) so that everyone can benefit from it. There is no sensitive information on the proposed mockup that justifies restricting it to checkusers. MER-C (talk) 13:00, 12 October 2019 (UTC)
  • I think this is a useful tool. But as on the other comments, it should be a public tool. That public tool should have a link to perform a check on the accounts, only visible for checkusers (like for checkusers there is a link to the checkuser tool on the contribution page). For the tool on a specific wiki it should only show that specific wiki, with a link to meta to do a crosswiki check. With that, the local tool should link to the local contributions, the crosswiki one should per wiki have a link to local contributions and have a link to guc. Regarding guc, I think that should then not be an 'external' tool anymore, but integrated as a special page on meta. Akoopal (talk) 16:13, 13 October 2019 (UTC)
  • @MER-C and Akoopal: I agree that it is all public information and could be made available to all users. Our foremost priority is to make improvements to CheckUser. Testing this feature with a small group of users will allow us to gather quick feedback and iterate faster. Regarding making guc a special page, that is definitely highly desirable but it would be a very big project in itself. We want to focus our efforts towards reducing the biggest pain points of using CheckUser first. -- NKohli (WMF) (talk) 05:44, 17 October 2019 (UTC)

How else could the CheckUser information be presented?[edit]

  • Display ASN and geolocation information next to IPs. (see also phab:T174553) Huji (talk) 00:50, 5 October 2019 (UTC)
  • @Huji: This is definitely something we have been thinking about. We would need to coordinate with the Operations and Security teams to fulfill that request. While we do that groundwork, we hope to meanwhile improve existing CheckUser data presentation. -- NKohli (WMF) (talk) 12:29, 8 October 2019 (UTC)
  • ...

What do you think of the information being displayed in the actual CheckUser step?[edit]

  • I assume this is a proposed makeover for the "Get IP addresses" view. This seems quite good; it would quickly tell me whether or not the the accounts might be related, and whether I need to check individual IPs or ranges. I don't need a separate tab for each account I look up. Good stuff!

    For "Anything else?", we should include all the same information we have currently, such as block/lock status and user groups. Preferably "Activity" would show a full date range of editing activity, as it does currently. While I know this isn't a proper wireframe, I don't think a tabular format would work well given the amount of information and links we'd need to display. I personally find the status-quo to be readable enough. Note also we customize the links with w:MediaWiki:Checkuser-toollinks.

    I don't think this UI could replace "get users" (and obviously not "get edits"). We need to see individual IP users in addition to accounts, and the option to see everyone's edits. This helps to further conclude if there is a connection and whether hard blocks are appropriate. There are use cases to "get edits" by a single account or IP, too. MusikAnimal talk 23:00, 8 October 2019 (UTC)

  • Make accessible full HTTP headers. The WMF is automatically supplied that information, and it may be useful in addition to UA/location. MER-C (talk) 13:02, 12 October 2019 (UTC)
  • Thanks MER-C. I will be checking in with Legal about Ops about that exposing that information. -- NKohli (WMF) (talk) 05:44, 17 October 2019 (UTC)

How can we improve the CheckUser logs to be more helpful with the above proposed improvements?[edit]

  • ...

What are we missing?[edit]

Note: We are not adding any new information to the tool, as a first step. We are looking to improve access to the information already being presented in the tool. We want to be able to deliver value quickly and iterate based on the feedback we receive.

  • Making phab:T146837 a reality would be a dream come true. Even just doing exact matches (no wildcards) would be of great help. MusikAnimal talk 19:50, 4 October 2019 (UTC)
  • As far as I can see, this is just replacing the initial steps - the 'preliminary check' is similar to the CentralAuth page, which is not usually important, and the 'Checkuser' section is just replacing the 'Get IP addresses' stage in the current tool. The latter is of course a very important starting point for looking at accounts, and I'm sure it can be improved, but is only ever part of the story. The all important bit is the 'Get edits' part, or 'ipcheck' as it's (perhaps) called here, which I don't see discussed at all. Maybe no changes, or something else is planned for it? That would be OK, but it's not clear from this page. The other important bit is the 'Get users' part of the current tool, which is indispensable for blocking large numbers of accounts - again not mentioned. The page does rather ominously mention 'showing all the necessary data in one interface', which I find difficult to imagine. So perhaps we are missing how this proposal fits into the rest of the picture? -- zzuuzz (talk) 05:37, 8 October 2019 (UTC)
  • I see the ipcheck refers to the ipcheck tool, which is sometimes useful but just one of many. What the 'Get edits' part currently does is add the context for what is being done at the time, and the precise timeline of events. I'm also having difficulty imagining what the proposed data will look like for a prolific multi-account socker using multiple addresses in a range at multiple times. These can be picked up relatively easily with the current 'get edits' tool. -- zzuuzz (talk) 13:43, 8 October 2019 (UTC)
  • @Zzuuzz: Thanks for the feedback. I'm curious to learn more about your workflow with CheckUser. In the several CheckUser interviews we did, we rarely saw Get edits being used. Get IP addresses and Get users were primarily used. When is it that you would use Get edits and what does it offer that the other views don't? The current view described on the page would list all the IPs used by all the accounts under question (and also identify other accounts using the same IPs). Could you give me a made up example of a case where you think the proposed views won't work? Thank you so much! :) -- NKohli (WMF) (talk) 16:14, 8 October 2019 (UTC)
  • OK, I can accept I might be a bit of an outlier. I hardly ever use 'Get users', except to block many accounts after having got the edits. I wonder if one difference might be that I am rarely looking to confirm whether one user is another, instead I am mainly looking for other sockpuppet accounts and especially the collateral for IP blocks. In the language of this page I will more often do most profiling after the checking. I'm also looking a lot at ranges with multiple people. You can't say that one user (that you've come across) is another just because they use the same IP and user agent - you need to look at what they're doing and the timeline of events (things like IP switching and browser changes are only really clear in this view). Additionally you see all the data in one view (filter hits, account creation, deleted edits, password resets, ...). So I think some of this might be lost. I also have to repeat though, being able to block 50 accounts from the CU results with a couple of clicks is a Great Thing. -- zzuuzz (talk) 20:28, 8 October 2019 (UTC)
  • I use "get edits" whenever I need a timeline. For example:
  • 2019-10-09 00:10 GermanyFan edits History of Germany
  • 2019-10-09 00:12 StarWarsFan01 edits Star Wars
  • 2019-10-09 00:13 StarWarsFan02 edits Star Wars
  • 2019-10-09 00:14 StarWarsFan03 edits Star Wars
  • 2019-10-09 00:14 GermanyFan edits History of Germany
  • 2019-10-09 00:15 StarWarsFan04 trips an edit filter on Star Wars
  • Everyone is on the same IP address and is indistinguishable from each other. However, GermanyFan has no interest in Star Wars whatsoever, and the various Star Wars fans don't care about the history of Germany. "Get edits" is useful in highlighting the Star Wars fans as potential socks and differentiating GermanyFan from them. Otherwise, I have to spend a lot of time in the Editor Interaction Analyser. Like Zzuuzz, I like being able to see a timeline of everything that has happened on an IP address or IP range. NinjaRobotPirate (talk) 10:57, 9 October 2019 (UTC)
  • The workflow for another type of typical check looks something like this:
  • Vandal A is an obvious LTA sockpuppet - instant block.
  • We probably want to block their IP address because we've blocked a lot of their accounts recently (Vandal B), and also block (and revert!) the other 30 accounts they've probably created that we don't know about. Get IP addresses for User A - we find they are editing throughout a highly dynamic /39 IPv6 range.
  • Get edits for the range (or users, it doesn't really matter here). Examine the contributions and timelines for the range, to identify the other accounts.
  • This is a type of check that most CUs will have done and know how to do. But it's at this point I can't imagine what the proposed check or results will look like:
CheckUser
Username Activity IP address User agent Anything else?
Vandal A August 12, 11:00 fe80:0:0:1::1 (ipcheck) - 1 edit Chrome 65, Windows 10
Vandal B August 12, 11:10 fe80:0:0:c::c (ipcheck) - 1 edit Chrome 65, Windows 10
Linked accounts (below accounts were found associated with the IP addresses found above)
None?
  • Now what? Can we block this guy and his sockfarm or not? -- zzuuzz (talk) 14:40, 9 October 2019 (UTC)
  • @NinjaRobotPirate and Zzuuzz: This is extremely helpful, thank you! I understand the use case for Get edits much better now. Would it be helpful if the tool can generate the timeline for edits (similar to what NinjaRobotPirate made but with more info) based on the usernames and IPs that are input on the first screen? I understand right now the tool can only do it for one user/IP at a time. If that will be helpful, what kind of information would you expect from such a timeline? -- NKohli (WMF) (talk) 14:01, 11 October 2019 (UTC)
  • I didn't quite understand the usefulness of "get edits" at first, either. The tool provides much more information than what I posted, of course, but I was just trying to give an uncomplicated example (I wouldn't want it streamlined down to what I posted). If by "first screen" you mean the table you labeled "preliminary check", I don't really know. It sounds like it would be useful, but it's difficult for me to visualize exactly what the results would be. I'm used to the current way of doing things. I hated the CheckUser UI at first, but it's grown familiar. NinjaRobotPirate (talk) 15:43, 11 October 2019 (UTC)
  • @NinjaRobotPirate: No, the screen before that (don't have a mock yet), in step 1 where we take an input of all the usernames/IPs you want to look at. We can come up with a Get edits like interface for the edits by those users/IPs in the tool which can generate a timeline for you, similar to the one you posted. You wouldn't have to look up the Get edits for each user like you have to right now. What are the key things that you are looking for, when you use Get edits? I imagine if users are checking the same pages, that is something to make note of. What else do you do that can possibly be flagged by the tool automatically? (also @MusikAnimal: on this thread) -- NKohli (WMF) (talk) 17:04, 11 October 2019 (UTC)
  • "Get edits" can be used for both usernames and IP addresses/ranges. On usernames, it's occasionally useful to see all of someone's edits annotated with their system configuration. You can see this information elsewhere but not at a glance. "Get edits" can also be used on IP ranges to see all edits made by all accounts on that IP range. If you know someone uses a really generic system configuration, "get users" might not be as useful as "get edits". "Get users" can tell you who all the Firefox users are on a Verizon IP range, but sometimes you don't really care about that. You're more concerned with finding Verizon customers who edit Star Wars articles regardless of browser. "Get edits" will tell you that. It's also useful for finding edit summaries. For example, it's easy to find Verizon customers who habitually call people fascists in their edit summary in "get edits". "Get users" is great for finding sock puppets based on technical matches, and sometimes I use it to find behavioral matches, too. The problem is that I tend to open up a lot of tabs if I use it this way. For example, I'll sometimes open a new tab for each suspicious-looking editor and skim over their contributions. If I already know the sockmaster and I can skim over their edits fast enough, this can be substantially faster than "get edits". "Get edits" returns a lot of useless information, especially on busy IP ranges. If I want to see Firefox users on Windows 7 who edit Star Wars from Verizon IP ranges, "get edits" will list them. However, it also lists everybody else, and there's no way to filter out the millions of Chrome users, Windows 10 users, or MacOS users. At least with "get users" I don't have megabytes of useless information cluttering my screen, even if it doesn't tell me anything about who's editing Star Wars. I'm not really sure if any of this answers your questions... I can try to think more about this later. NinjaRobotPirate (talk) 02:27, 12 October 2019 (UTC)
  • I've always wanted some indication of the language, ie, the Accept-Language or perhaps Accept headers. And I also think not using the enwikiUserName (or equivalent) cookie is a totally missed opportunity. -- zzuuzz (talk) 05:37, 8 October 2019 (UTC)
  • I'm not sure I'm on the same page as my colleagues. I find I often do things differently. That said, I have some features I'd like to see in the current interface. They mainly consist of adding the ability to sort and to filter. I'll do the Get IP addresses first:
    • Sort by IP rather than chronologically, although it's fine to sort chronologically within IP.
    • Give me IP ranges without my having to copy them into the text box at the bottom.
  • Get users:
    • Filter out blocked users.
    • Filter out unblocked users.
    • Filter out unregistered users.
    • Filter out no-edit users.
    • Filter out selected users.
    • Give me the ability to get date ranges for each user/IP, as opposed to just the overall date range for all their edits (this is sort of a blend of Get edits and Get users but without quite the business of Get edits).
    • Give me the ability to get log entries for the IPs (again something that is shown in Get edits but not in Get users.
    • Give me the ability to see if two or more registered users are not only using the same range but individual IPs.
  • I may add more to this list. I'm going by memory, and checking may bring more to mind.--Bbb23 (talk) 00:38, 9 October 2019 (UTC)
  • ...
  • Long term, a graph representation of the relationships between IPs and accounts may be helpful, especially for complex investigations. MER-C (talk) 13:05, 12 October 2019 (UTC)
  • Finally took the time to look, in general this looks ok. What I am wondering, I see in the example the 'extra users' sessions for accounts found on the IP's. Will that be automatic or would that be a next step? There are cases where you check two accounts by just checking their ip, and if they don't match, also not range match, you don't need nor want to look up the ip's. What I would really like is for the 'get ip' case to already show the user agents, something you now need to check the ip's for. Further more, a handy way from the 'check ip' button to check the range instead is a must I think. When dealing with IPv6 you should always check the /64 (although there are use-cases for checking the ip itself). What might be nice would be a dynamic table that expands, for the complicated cases. I have had cases, where on the IP's you find new users, checking the users give new ip's again, and on those ip's you find again other users. So a button 'check this ip and add the users to the table' would be nice. Again, it should not be automatic as there might be users you don't want to check on dynamic ranges based on user agent. Akoopal (talk) 15:58, 13 October 2019 (UTC)
  • @Akoopal: Good point about making that check for other linked editors optional. We are still thinking about the UI for accepting an IP range in addition to an IP. Would you say the range should be automatically generated from the IP address? I will be sharing some design mockups early next week and would be really interested to hear your thoughts on that. -- NKohli (WMF) (talk) 05:44, 17 October 2019 (UTC)

General feedback[edit]

  • ...

Pagination for busy range results[edit]

With the current checkuser tool, if a check is tried on a busy range it may exceed a maximum number for results and then only gives a list of IP addresses with the number of edits per address which is not very usable (failed check). It does not give proper results. A desired feature would be to paginate the results (Page 1, Page 2, etc. as necessary) for busy ranges.

One current way around the max number exceeded is to select a lower time frame from a pull down box but this shortens the 90 day period to either one month, two weeks or one week. That loss of data is sometimes undesirable depending on the case that you are working on. That is already built in as a feature of the current tool.

Another workaround to retain the full 90 day information involves splitting the network and running separate checks. For example (using Class B reserved range), if a /16 range (172.16.0.0/16) fails because it is too busy then you might could split the network in half and run a check on 172.16.0.0/17 and then run a check on 172.16.128.0/17. If those fail then you can run four separate checks on 172.16.0.0/18, 172.16.64.0/18, 172.16.128.0/18 and 172.16.192.0/18. If that fails then you can run eight separate checks and so on. Some checkusers acquired their cu bit by being elected to Arbcom and may not have a strong networking background so they aren't likely to try this second workaround. It would make it easier for all checkusers to have pagination on busy range results which would forego the need for workarounds.
⋙–Berean–Hunter—► ((⊕)) 21:41, 8 October 2019 (UTC)

SockFilter?[edit]

Would it be possible to put alerts on certain IP-ranges / UA data so certain editors get flagged? (I know, likely these are flags that only checkusers will see, maybe only to admins?)

The current situation on-wiki is now that we see an editor with a pattern (possibly through AbuseFilter) that is recognised and that editor gets reported to CheckUsers to see the data behind the editor. That is often a Always-Too-Late action, prolific sockers are already on another account, and you keep hunting. It must be possible to flag certain combinations so that if a sock performs an action on wiki they get matched against the pattern. Setting the filter should be at CheckUser discretion and only used on serial-violators (and not to pre-emptively 'catch' editors). --Dirk Beetstra T C (en: U, T) 13:11, 9 October 2019 (UTC)

Get IP addresses vs Get Users vs Get Edits[edit]

@MusikAnimal, Zzuuzz, and NinjaRobotPirate: and others - I tried to enlist the various features and use cases for the three Get options in CheckUser. Does the below seem accurate? What other use cases do you have for these views? I'd appreciate your help in teasing those out. Thank you. -- NKohli (WMF) (talk) 05:56, 12 October 2019 (UTC)

This looks pretty good to me. It could probably be doubled in size after thinking for a week about every use case possible, but it's a good overview. NinjaRobotPirate (talk) 18:54, 13 October 2019 (UTC)
Yes, this sums it up nicely. A few minor corrections: For "Get IP addresses", a date range should be shown for activity (from available data), not just the latest action. I also think all views show the block status, current or previous. Next, the number shown next to usernames/IPs I believe is a count of logged actions as well as edits (or at least it should be, if it isn't). Finally, "Get edits" can be used on accounts too. I think you've got all the primary use cases covered. Best, MusikAnimal talk 21:39, 14 October 2019 (UTC)

Get IP addresses[edit]

Features[edit]

  • Shows the IP addresses associated with a user account.
  • Shows the timestamp for the latest activity on each IP address
  • Shows the number of edits made by the IP along with a number for total edits made by accounts operating on that IP address, indicated by something like: [2] (~5 by all users)
  • Provides links for tools to run checks on an IP

Use cases[edit]

  • Used to get a quick overview of a user account activity
  • To immediately see if this might be a big sock farm if the number of edits from the same IP is high
  • Run various checks for the listed IPs with the help of the tools linked under the IP address to find out the location information, if it is behind a Tor node or VPN etc.
  • ...

Get Users[edit]

Features[edit]

  • Shows the IP editors and user accounts editing from a given IP or IP range.
  • For each record -
    • Shows activity time period (start - end) and number of edits (denoted by something like [20])
    • Offers an option to run a WHOIS check on an IP
    • Offers an option to look at talk/contribs for an account
    • Shows IPs and user agents associated with the editor
  • Allows one to select IP editors and user accounts from the list and to block them (along with some block options)

Use cases[edit]

  • Used to find sleeper accounts/other socks that are created from the same IP or IP range
  • ...

Get Edits[edit]

Features[edit]

  • For a given IP or IP range, it displays a timeline of edits and log actions by the user accounts or IP editors operating from that IP or IP range.
  • Dates are in descending order (latest first).
  • Records are grouped by date. Each record displays:
    • If it is a log record.
    • Links to diff and history if not a log action/record.
    • Timestamp of activity
    • Page (if edit action)
    • Editor
    • Info about editor privileges
    • Info about whether editor was previously blocked
    • Edit summary of the edit
    • IP address
    • User-agent string

Use cases[edit]

  • Used to identify if two users are the same based on their activity pages, IPs, UAs
  • Sometimes used to figure out if a single user is behaving suspiciously based on their activity
  • Identify other accounts/editors from an IP range that might be behaving in a similar fashion as a known sock
  • ...