IP Editing: Privacy Enhancement and Abuse Mitigation/Research and tools/he

This page is a translated version of the page IP Editing: Privacy Enhancement and Abuse Mitigation/Research and tools and the translation is 25% complete.

Outdated translations are marked like this.

נתונים לגבי מניעת עריכה ממשתמשים לא רשומים בויקיפדיה הפורטוגזית

Portuguese Wikipedia’s metrics following restriction

30 August 2021 Update

Hello. This is a brief update about Portuguese Wikipedia’s metrics since they started requiring registration to edit. We have a comprehensive report on the Impact report page. This report includes metrics captured through data as well as a survey that was conducted among active Portuguese Wikipedia contributors.

All in all, the report presents the change in a positive light. We have not seen any significant disruption over the time period these metrics have been captured. In light of this, we are now encouraged to run an experiment on two more projects to see if we observe similar impact. All projects are unique in their own ways and what holds true for Portuguese Wikipedia might not hold true for another project. We want to run a limited-time experiment on two projects where registration will be required in order to edit. We estimate that it will take approximately 8 months for us to collect enough data to see significant changes. After that time period, we will return to not requiring registration to edit while we analyse the data. Once the data is published, the community will be able to decide for themselves whether or not they want to continue to disallow unregistered editing on the project.

אנחנו קוראים לזה ניסוי חובת הרשמה. אתם יכולים למצוא פרטים נוספים וכן לוח זמנים בדף של הניסוי. אנא השתמשו בדף זה ובדף השיחה שלו כדי להמשיך את הדיון.

Portuguese Wikipedia IP editing restriction

בשנה שעברה, ויקיפדיה הפורטוגזית מנעה עריכות של משתמשים לא רשומים. בחודשים האחרונים, הצוות שלנו אסף נתונים על התוצאות של הצעד הזה בהתייחס לבריאות הכללית של המיזם. כמו כן, שוחחנו עם מספר חברי קהילה לגבי הניסיון שלהם עם המצב החדש. אנחנו עובדים על הדברים האחרונים שדרושים כדי להציג את כל הנתונים שמאפשרים הצגה של תמונת מצב מדויקת של הפרויקט. אנחנו מקווים שיהיה עדכון לגבי זה ולגבי נושאים אחרים בעתיד הקרוב.

כלים

פיתוח כלים

As you might already know, we are working on building some new tools, partly to soften the effect of introducing temporary accounts, but also just to build better anti-vandalism tools for everyone. It is not a secret that the state of moderation tools on our projects doesn’t give the communities the tools they deserve. There is a lot of scope for improvement. We want to build tools that make it easier for anti-vandalism fighters to work effectively. We also want to reduce the barrier to entry into these roles for non-technical contributors.

דיברנו על כלים אלו בעבר, ואני אספק למטה עדכון קצר לגביהם. שימו לב שההתקדמות בפיתוח כלים אלו הייתה איטית בחודשים האחרונים, כי הצוותים שלנו היו עסוקים בשדרוג של SecurePoll כדי להתאים אותו לצרכים של הבחירות למועצת הקרן.

כלי למידע על IP

אנחנו בונים כלי שיאפשר הצגה של מידע חשוב על כתובת ה-IP, שבמקרים רבים יש בו צורך במהלך חקירה. בדרך כלל, מפעילים, מנטרים ובודקים מסתמכים על אתרים חיצוניים כדי לספק את המידע הזה. אנחנו מקווים להקל על עבודתם על ידי אינטגרציה של מידע מספקי IP אמינים בתוך האתרים שלנו. לאחרונה בנינו אב טיפוס וערכנו סבב של בדיקות על ידי משתמשים כדי לתקף את גישתנו. רוב העורכים שרואיינו חשבו שהכלי סייע להם וציינו שהם יהיו מעוניינים להשתמש בו בעתיד. אנחנו רוצים להסב את תשומת לבכם לעדכון בדף הפרויקט. שאלות מפתח שלגביהן אנחנו רוצים את הפידבק שלכם בדף השיחה:

כשאתם חוקרים IP איזה סוג של מידע אתם מחפשים? באיזה דף אתם בדרך כלל משתמשים כשאתם מחפשים את המידע הזה?
אלו סוגים של מידע על IP הוא הכי שימושי עבורכם?
אלו סוגים של מידע על IP עשוי להעמיד את העורכים האנונימים בסכנה כאשר משתפים אותו?

כלי להתאמת עורכים

כלי זה כונה בשיחות קודמות גם "עורכים קרובים" ו"איתור בובות קש". אנחנו מנסים למצוא שם מתאים שיהיה מובן גם לאנשים שלא מבינים את הביטויים "בובת קש" או "בובת גרב".

אנחנו נמצאים בשלב מוקדם של המיזם הזה. לקרן ויקימדיה יש מיזם שיכול לעזור בזיהוי שני עורכים בעלי התנהגות דומה. מיזם זה יכול לעזור בקישור בין שני משתמשים לא רשומים, כאשר הם עורכים תחת שני שמות משתמש שנוצרים באופן אוטומטי. המיזם קיבל הרבה תמיכה כשהתחלנו לדבר עליו לפני שנים. שמענו גם על הסיכונים שכרוכים בפיתוח כלי כזה. אנחנו מתכננים לבנות אב טיפוס בזמן הקרוב ולשתף אותו עם הקהילה. יש למיזם הזה דף שלא זוכה לתשומת לב מספקת. אנחנו מקווים שהוא יעודכן בקרוב. נשמח לשמוע את המחשבות שלכם לגבי המיזם הזה בדף השיחה של המיזם.

כאמור, המטרה המרכזית שלנו היא לספק כלים טובים יותר לטיפול בהשחתות עבור הקהילות שלנו, שיסייעו ללוחמים בשחיתות ובאותו הזמן יפחיתו את הצורך בגישה לכתובות IP. סיבה חשובה נוספת לעשות את זה היא שכתובות IP הן קשות להבנה ובאופן מעשי הן שימושיות במיוחד רק למשתמשים עם יכולות טכניות טובות. מצב זה יוצר מחסום למשתמשים חדשים ללא רקע טכני, שרוצים לקבל תפקידים מערכתיים, וזאת משום שיש להם עקומת למידה תלולה יותר בהבנת העבודה עם כתובות IP. אנחנו מקווים להגיע למצב שיש לנו כלי בקרה שימושיים לכולם, בלי צורך בידע מוקדם.

הדבר הראשון שבו החלטנו להתמקד היה יצירה של כלי לבודקים שיהיה גמיש יותר, רב-עוצמה וקל לשימוש. זה כלי חשוב שעונה על הצורך לאתר ולחסום פעילות שאינה על פי הכללים (במיוחד, שימוש ארוך טווח לרעה) ברבים מהמיזמים שלנו. כתוצאה מתחזוקה לא מספקת במשך שנים רבות, כלי הבדיקה היה מיושן וחסרו בו מרכיבים חיוניים.

We also anticipated an uptick in the number of users who opt-in to the role of becoming a CheckUser on our projects once temporary accounts are introduced. This reinforced the need for a better, easier CheckUser experience for our users. With that in mind, the Anti-Harassment Tools team spent the past year working on improving the CheckUser tool – making it much more efficient and user-friendly. This work has also taken into account a lot of outstanding feature requests by the community. We have continually consulted with CheckUsers and stewards over the course of this project and have tried our best to deliver on their expectations. The new feature is set to go live on all projects in October 2020.

התכונה הבאה שאנחנו עובדים עליה היא מידע על IP (באנגלית: IP info). החלטנו על הפרויקט הזה לאחר סבב התייעצות בשישה אתרי ויקי שעזרו לנו לצמצם את מקרי השימוש בכתובות IP במיזמים שלנו. התברר בשלב מוקדם שיש כמה פיסות מידע קריטיות שכתובות IP מספקות, ושצריכות להיות זמינות למנטרים כדי שיוכלו לבצע את תפקידם ביעילות. המטרה של "מידע על IP", אם כן, היא להציג במהירות ובקלות מידע משמעותי על כתובת IP. כתובות IP מספקות מידע חשוב כגון מיקום, ארגון, אפשרות להיות צומת Tor/VPN, נתוני rDNS, וטווח רשום, אם להזכיר כמה דוגמאות. על־ידי היכולת להראות זאת, במהירות ובקלות, ללא צורך בכלים חיצוניים שלא כולם יכולים להשתמש בהם, אנו מקווים שנוכל להקל על המנטרים לבצע את עבודתם. המידע המסופק ברמה גבוהה מספיק כדי שנוכל להציג אותו מבלי לסכן את המשתמש האלמוני. יחד עם זאת, זה מספיק מידע כדי שמנטרים יוכלו לעשות שיפוט איכותי לגבי כתובת IP.

After IP Info we will be focusing on a finding similar editors feature. We’ll be using a machine learning model, built in collaboration with CheckUsers and trained on historical CheckUser data to compare user behavior and flag when two or more users appear to be behaving very similarly. The model will take into account which pages users are active on, their writing styles, editing times etc. to make predictions about how similar two users are. We are doing our due diligence in making sure the model is as accurate as possible.

Once it’s ready, there is a lot of scope for what such a model can do. As a first step we will be launching it to help CheckUsers detect socks easily without having to perform a lot of manual labor. In the future, we can think about how we can expose this tool to more people and apply it to detect malicious sockpuppeting rings and disinformation campaigns.

You can read more and leave comments on our project page for tools.

Research

IP masking impact report

IP addresses are valuable as a semi-reliable partial identifier, which is not easily manipulated by their associated user. Depending on provider and device configuration, IP address information is not always accurate or precise, and deep technical knowledge and fluency is needed to make best use of IP address information, though administrators are not currently required to demonstrate such fluency to have access. This technical information is used to support additional information (referred to as “behavioural knowledge”) where possible, and the information taken from IP addresses significantly impact the course of administrative action taken.

On the social side, the issue of whether to allow unregistered users to edit has been a subject of extensive debate. So far, it has erred on the side of allowing unregistered users to edit. The debate is generally framed around a desire to halt vandalism, versus preserving the ability for pseudo-anonymous editing and lowering the barrier to edit. There is a perception of bias against unregistered users because of their association with vandalism, which also appears as algorithmic bias in tools such as ORES. Additionally, there are major communications issues when trying to talk to unregistered users, largely due to lack of notifications, and because there is no guarantee that the same person will be reading the messages sent to that IP talk page.

In terms of the potential impact of IP masking, it will significantly impact administrator workflows and may increase the burden on CheckUsers in the short term. If or when IP addresses are masked, we should expect our administrators' ability to manage vandalism to be greatly hindered. This can be mitigated by providing tools with equivalent or greater functionality, but we should expect a transitional period marked by reduced administrator efficacy. In order to provide proper tool support for our administrators’ work, we must be careful to preserve or provide alternatives to the following functions currently fulfilled by IP information:

Block efficacy and collateral estimation
Some way of surfacing similarities or patterns among unregistered users, such as geographic similarity, certain institutions (e.g. if edits are coming from a high school or university)
The ability to target specific groups of unregistered users, such as vandals jumping IPs within a specific range
Location or institution-specific actions (not necessarily blocks); for example, the ability to determine if edits are made from an open proxy, or public location like a school or public library.

Depending on how we handle temporary accounts or identifiers for unregistered users, we may be able to improve communication to unregistered users. Underlying discussions and concerns around unregistered editing, anonymous vandalism, and bias against unregistered users are unlikely to significantly change if we mask IPs, provided we maintain the ability to edit projects while logged out.

CheckUser workflow

We interviewed CheckUsers on multiple projects throughout our process for designing the new Special:Investigate tool. Based on interviews and walkthroughs of real-life cases, we broke down the general CheckUser workflow into five sections:

Triaging: assessing cases for feasibility and complexity.
Profiling: creating a pattern of behaviour which will identify the user behind multiple accounts.
Checking: examining IPs and useragents using the CheckUser tool.
Judgement: matching this technical information against the behavioural information established in the Profiling step, in order to make a final decision about what kind of administrative action to take.
Closing: reporting the outcome of the investigation on public and private platforms where necessary, and appropriately archiving information for future use.

We also worked with staff from Trust and Safety to get a sense for how the CheckUser tool factors into Wikimedia Foundation investigations and cases that are escalated to T&S.

The most common and obvious pain points all revolved around the CheckUser tool's unintuitive information presentation, and the need to open up every single link in a new tab. This caused massive confusion as tab proliferation quickly got out of hand. To make matters worse, the information that CheckUser surfaces is highly technical and not easy to understand at first glance, making the tabs difficult to track. All of our interviewees said that they resorted to separate software or physical pen and paper in order to keep track of information.

We also ran some basic analyses of English Wikipedia's Sockpuppet Investigations page to get some baseline metrics on how many cases they process, how many are rejected, and how many sockpuppets a given report contains.

Patroller use of IP addresses

Previous research on patrolling on our projects has generally focused on the workload or workflow of patrollers. Most recently, the Patrolling on Wikipedia study focuses on the workflows of patrollers and identifying potential threats to current anti-vandal practices. Older studies, such as the New Page Patrol survey and the Patroller work load study, focused on English Wikipedia. They also look solely at the workload of patrollers, and more specifically on how bot patrolling tools have affected patroller workloads.

Our study tried to recruit from five target wikis, which were

ויקיפדיה היפנית
ויקיפדיה ההולנדית
ויקיפדיה הגרמנית
ויקיפדיה הסינית
ויקיציטוט האנגלי

They were selected for known attitudes towards IP edits, percentage of monthly edits made by IPs, and any other unique or unusual circumstances faced by IP editors (namely, use of the Pending Changes feature and widespread use of proxies). Participants were recruited via open calls on Village Pumps or the local equivalent. Where possible, we also posted on Wiki Embassy pages. Unfortunately, while we had interpretation support for the interviews themselves, we did not extend translation support to the messages, which may have accounted for low response rates. All interviews were conducted via Zoom, with a note-taker in attendance.

Supporting the findings from previous studies, we did not find a systematic or unified use of IP information. Additionally, this information was only sought out after a certain threshold of suspicion. Most further investigation of suspicious user activity begins with publicly available on-wiki information, such as checking previous local edits, Global Contributions, or looking for previous bans.

Precision and accuracy were less important qualities for IP information: upon seeing that one chosen IP information site returned three different results for the geographical location of the same IP address, one of our interviewees mentioned that precision in location was not as important as consistency. That is to say, so long as an IP address was consistently exposed as being from one country, it mattered less if it was correct or precise. This fits with our understanding of how IP address information is used: as a semi-unique piece of information associated with a single device or person, that is relatively hard to spoof for the average person. The accuracy or precision of the information attached to the user is less important than the fact that it is attached and difficult to change.

Our findings highlight a few key design aspects for the IP info tool:

Provide at-a-glance conclusions over raw data
Cover key aspects of IP information:
- Geolocation (to a city or district level where possible)
- Registered organization
- Connection type (high-traffic, such as data center or mobile network versus low-traffic, such as residential broadband)
- Proxy status as binary yes or no

As an ethical point, it will be important to be able to explain how any conclusions are reached, and the inaccuracy or imprecisions inherent in pulling IP information. While this was not a major concern for the patrollers we talked to, if we are to create a tool that will be used to provide justifications for administrative action, we should be careful to make it clear what the limitations of our tools are.

––
Best regards,
Trust and Safety Product

Please use the project talk page for discussions on the matter. For any issues concerning this release, please don't hesitate to leave a message on the project talk page or contact Szymon Grabarczuk.