Рэдагаваньне з IP: пашырэньне прыватнасьці і зьмяншэньне злоўжываньняў
In recent years, internet users have become more aware of the importance of understanding the collection and use of their personal data. Governments in various countries have created new laws in an effort to better protect user privacy. The Wikimedia Foundation’s legal and public policy teams are continually monitoring developments in various laws around the world, how we can best protect user privacy, respect user expectations, and uphold the values of the Wikimedia movements. With that backdrop, they asked us to investigate and embark upon a technical improvement to the projects. We need to do this together with you.
MediaWiki stores and publishes the IP addresses of unregistered contributors (as part of their signature, in the page history and in logs), and makes them visible to anyone visiting our sites. Publication of these IP addresses compromises the safety and anonymity of these users. In some cases, it may even invite the danger of putting people at risk of government persecution. While we do tell users their IP address will be visible, few understand the ramifications of this information. We are working on increased privacy protection for unregistered contributors by obscuring their IP addresses when they contribute to the projects, akin to how the average reader can’t see the IP of a registered user. This will involve creating a "masked IP" username, which will be automatically generated, but human-readable. We have different ideas about how to best implement this. You can comment to let us know what you need.
Wikimedia projects have a very good reason for storing and publishing IP addresses: they play a critical role in keeping vandalism and harassment off our wikis. It's very important that patrollers, admins and functionaries have tools that can identify and block vandals, sockpuppets, editors with conflicts of interest and other bad actors. Working with you, we want to figure out ways to protect our users’ privacy while keeping our anti-vandalism tools working at-par with how they work now. The most important part of this is the development of new tools to help anti-vandalism work. Once this is done, we hope to work on shielding IP addresses from our wikis – including restricting the number of people who can see other users' IP addresses, and reducing the amount of time IP addresses are stored in our databases and logs. It is important to note that a critical part of this work will be to ensure that our wikis still have access to the same – or better – level of anti-vandalism tooling and are not at risk of facing abuse.
The Wikimedia Foundation goal is to create a set of moderation tools that will take away the need of everyone having direct access to IP addresses. With this evolution of our moderation tools, we will be able to mask the IPs of unregistered accounts. We’re very aware that this change will impact current moderation workflows, and we want to ensure that the new tools will enable effective moderation, protect the projects from vandalism, and support community oversight.
We can only get to that decision point by working in partnership with CheckUsers, stewards, admins and other vandal fighters.
This is a very challenging problem, with risks for our ability to protect our wikis should we fail, which is why it's been put off over the years. But in light of evolving data-privacy standards on the internet, new laws, and changing user expectations, the Wikimedia Foundation thinks that now is the time to tackle this problem.
Like mentioned previously, our foremost goal is to provide better anti-vandalism tools for our communities which will provide a better moderation experience for our vandal fighters while also working towards making the IP address string less valuable for them. Another important reason to do this is that IP addresses are hard to understand and are really very useful only to tech-savvy users. This creates a barrier for new users without any technical background to enter into functionary roles as there is a higher learning curve for them to work with IP addresses. We hope to get to a place where we can have moderation tools that anyone can use without much prior knowledge.
The first thing we decided to focus on was to make the CheckUser tool more flexible, powerful and easy to use. It is an important tool that services the need to detect and block bad actors (especially long-term abusers) on a lot of our projects. The CheckUser tool was not very well maintained for many years and as a result it appeared quite dated and lacked necessary features.
We also anticipated an uptick in the number of users who opt-in to the role of becoming a CheckUser on our projects once IP Masking goes into effect. This reinforced the need for a better, easier CheckUser experience for our users. With that in mind, the Anti-Harassment Tools team spent the past year working on improving the CheckUser tool – making it much more efficient and user-friendly. This work has also taken into account a lot of outstanding feature requests by the community. We have continually consulted with CheckUsers and stewards over the course of this project and have tried our best to deliver on their expectations. The new feature is set to go live on all projects in October 2020.
The next feature that we are working on is IP info. We decided on this project after a round of consultation on six wikis which helped us narrow down the use cases for IP addresses on our projects. It became apparent early on that there are some critical pieces of information that IP addresses provide which need to be made available for patrollers to be able to do their roles effectively. The goal for IP Info, thus, is to quickly and easily surface significant information about an IP address. IP addresses provide important information such as location, organization, possibility of being a Tor/VPN node, rDNS, listed range, to mention a few examples. By being able to show this, quickly and easily without the need for external tools everyone can’t use, we hope to be able to make it easier for patrollers to do their job. The information provided is high-level enough that we can show it without endangering the anonymous user. At the same time, it is enough information for patrollers to be able to make quality judgements about an IP address.
After IP Info we will be focusing on a finding similar editors feature. We’ll be using a machine learning model, built in collaboration with CheckUsers and trained on historical CheckUser data to compare user behavior and flag when two or more users appear to be behaving very similarly. The model will take into account which pages users are active on, their writing styles, editing times etc to make predictions about how similar two users are. We are doing our due diligence in making sure the model is as accurate as possible.
Once it’s ready, there is a lot of scope for what such a model can do. As a first step we will be launching it to help CheckUsers detect socks easily without having to perform a lot of manual labor. In the future, we can think about how we can expose this tool to more people and apply it to detect malicious sockpuppeting rings and disinformation campaigns.
You can read more and leave comments on our project page for tools.
We who are working on this are doing this because the legal and public policy teams advised us that we should evolve the projects’ handling of IP addresses in order to keep up with current privacy standards, laws, and user expectations. That’s really the main reason.
We also think there are other compelling reasons to work on this. If someone wants to help out and don’t understand the ramifications of their IP address being publicly stored, their desire to make the world and the wiki a better place results in inadvertently sharing their personal data with the public. This is not a new discussion: we’ve had it for about as long as the Wikimedia wikis have been around. An IP address can be used to find out a user’s geographical location and institution and other personally identifiable information, depending on how the IP address was assigned and by whom. This can sometimes mean that an IP address can be used to pinpoint exactly who made an edit and from where, especially when the editor pool is small in a geographic area. Concerns around exposing IP addresses on our projects have been brought repeatedly by our communities and the Wikimedia movement as a whole has been talking about how to solve this problem for at least fifteen years. Here’s a (non-exhaustive) list of some of the previous discussions that have happened around this topic.
We acknowledge that this is a thorny issue, with the potential for causing disruptions in workflows we greatly respect and really don’t want to disrupt. We would only undertake this work, and spend so much time and energy on it, for very good reason. These are important issues independently, and together they have inspired this project: there’s both our own need and desire to protect those who want to contribute to the wikis, and developments in the world we live in, and the online environment in which the projects exist.
Глядзіце таксама: Дасьледаваньне:Справаздача па ўплыве хаваньня IP.
IP masking impact
IP-адрасы каштоўныя як паўнадзейны частковы ідэнтыфікатар, які зьвязаны зь ім удзельнік ня можа лёгка зьмяніць. У залежнасьці ад правайдэра і наладаў прылады інфармацыя пра IP-адрас не заўсёды правільная і дакладная, і для найлепшага эфэкту работы з інфармацыяй пра IP-адрас неабходна мець глыбокія тэхнічныя веды і спрыт, хоць цяпер для атрыманьня статусу адміністратараў не патрабуецца такіх ведаў. Гэтая тэхнічная інфармацыя выкарыстоўваецца пры магчымасьці для падтрымкі дадатковай інфармацыі (якая называецца «веданьнем паводзінаў»), і ўзятая з IP-адрасоў інфармацыя значна ўплывае на прадпрынятыя адміністратарамі сродкі.
З сацыяльнага пункту гледжаньня пытаньне пра тое, ці дазваляць рэдагаваньне незарэгістраваным удзельнікам, было прадметам працяглых абмеркаваньняў. Дагэтуль думка схілялася ў бок дазволу рэдагаваньня незарэгістраваным удзельнікам. Спрэчка пераважна канцэнтравалася на супярэчнасьці жаданьня пазьбегнуць вандалізму ды захаваньня магчымасьці псэўдаананімнага рэдагаваньня і зьніжэньня бар’еру рэдагаваньня. Існуе пэўная прадузятасьць супраць незарэгістраваных удзельнікаў з-за асацыяцыі іх з вандалізмам, што ўключана таксама ў альгарытмы такіх інструмэнтаў, як ORES. Акрамя гэтага, існуюць вялікія праблемы са спосабам сувязі зь незарэгістраванымі ўдзельнікамі, бо яны ня маюць магчымасьці атрымліваць паведамленьні, а таксама няма ўпэўненасьці, што пакінутае на старонцы абмеркаваньняў IP-адрасу паведамленьне прачытае той самы ўдзельнік.
Пры патэнцыйным задзейнічаньні хаваньня IP гэта значна паўплывае на працоўны працэс адміністратараў і ў кароткатэрміновай пэрспэктыве значна абцяжарыць спраўджвальнікаў. Калі IP-адрасы пачнуць хавацца, варта чакаць значнага зьніжэньня эфэктыўнасьці нашых адміністратараў у барацьбе з вандалізмам. Гэта можна нівэляваць прадастаўленьнем раўназначнай ці большай функцыянальнасьці, але на пераходны пэрыяд варта чакаць зьніжэньня працаздольнасьці адміністратараў. Каб прадаставіць для працы нашых адміністратараў падтрымку адпаведных інструмэнтаў, мы мусім паклапаціцца пра захаваньне ці наданьне альтэрнатываў наступным функцыям, якія цяпер забясьпечваюцца інфармацыяй з IP:
- Дзейснасьць блякаваньня і спадарожная ацэнка
- Пэўны спосаб выяўленьня падабенстваў ці шаблёнаў у незарэгістраваных удзельнікаў, такіх як геаграфічнае падабенства, канкрэтныя ўстановы (напр. калі рэдагаваньні паходзяць з школы ці ўнівэрсытэту)
- Здольнасьць адрасаваньня пэўных групаў незарэгістраваных удзельнікаў, кшталту вандалаў, якія маюць пэўны дыяпазон IP
- Спэцыфічныя дзеяньні на месцазнаходжаньне ці ўстанову (не абавязкова блякаваньні); напрыклад, магчымасьць вызначыць, ці зробленыя рэдагаваньні з адкрытага проксі ці публічнага разьмяшчэньня: школы, бібліятэкі.
Залежна ад таго, як мы апрацоўваем часовыя ўліковыя запісы ці ідэнтыфікатары для незарэгістраваных удзельнікаў, будзем здольныя паляпшаць сувязь зь незарэгістраванымі ўдзельнікамі. Глыбінныя абмеркаваньні і перасьцярогі што да незарэгістраваных рэдагаваньняў, ананімнага вандалізму і прадузятасьці да незарэгістраваных удзельнікаў наўрад ці значна зьменяцца, калі мы схаваем IP, бо магчымасьць рэдагаваньня праектаў пасьля выхаду з уліковага запісу застанецца.
We interviewed CheckUsers on multiple projects throughout our process for designing the new Special:Investigate tool. Based on interviews and walkthroughs of real-life cases, we broke down the general CheckUser workflow into five sections:
- Triaging: assessing cases for feasibility and complexity.
- Profiling: creating a pattern of behaviour which will identify the user behind multiple accounts.
- Checking: examining IPs and useragents using the CheckUser tool.
- Judgement: matching this technical information against the behavioural information established in the Profiling step, in order to make a final decision about what kind of administrative action to take.
- Closing: reporting the outcome of the investigation on public and private platforms where necessary, and appropriately archiving information for future use.
We also worked with staff from Trust and Safety to get a sense for how the CheckUser tool factors into Wikimedia Foundation investigations and cases that are escalated to T&S.
The most common and obvious pain points all revolved around the CheckUser tool's unintuitive information presentation, and the need to open up every single link in a new tab. This cause massive confusion as tab proliferation quickly got out of hand. To make matters worse, the information that CheckUser surfaces is highly technical and not easy to understand at first glance, making the tabs difficult to track. All of our interviewees said that they resorted to separate software or physical pen and paper in order to keep track of information.
We also ran some basic analyses of English Wikipedia's Sockpuppet Investigations page to get some baseline metrics on how many cases they process, how many are rejected, and how many sockpuppets a given report contains.
Patroller use of IP addresses
Previous research on patrolling on our projects has generally focused on the workload or workflow of patrollers. Most recently, the Patrolling on Wikipedia study focuses on the workflows of patrollers and identifying potential threats to current anti-vandal practices. Older studies, such as the New Page Patrol survey and the Patroller work load study, focused on English Wikipedia. They also look solely at the workload of patrollers, and more specifically on how bot patrolling tools have affected patroller workloads.
Our study tried to recruit from five target wikis, which were
- Japanese Wikipedia
- Dutch Wikipedia
- German Wikipedia
- Chinese Wikipedia
- English Wikiquote
They were selected for known attitudes towards IP edits, percentage of monthly edits made by IPs, and any other unique or unusual circumstances faced by IP editors (namely, use of the Pending Changes feature and widespread use of proxies). Participants were recruited via open calls on Village Pumps or the local equivalent. Where possible, we also posted on Wiki Embassy pages. Unfortunately, while we had interpretation support for the interviews themselves, we did not extend translation support to the messages, which may have accounted for low response rates. All interviews were conducted via Zoom, with a note-taker in attendance.
Supporting the findings from previous studies, we did not find a systematic or unified use of IP information. Additionally, this information was only sought out after a certain threshold of suspicion. Most further investigation of suspicious user activity begins with publicly available on-wiki information, such as checking previous local edits, Global Contributions, or looking for previous bans.
Precision and accuracy were less important qualities for IP information: upon seeing that one chosen IP information site returned three different results for the geographical location of the same IP address, one of our interviewees mentioned that precision in location was not as important as consistency. That is to say, so long as an IP address was consistently exposed as being from one country, it mattered less if it was correct or precise. This fits with our understanding of how IP address information is used: as a semi-unique piece of information associated with a single device or person, that is relatively hard to spoof for the average person. The accuracy or precision of the information attached to the user is less important than the fact that it is attached and difficult to change.
Our findings highlight a few key design aspects for the IP info tool:
- Provide at-a-glance conclusions over raw data
- Cover key aspects of IP information:
- Geolocation (to a city or district level where possible)
- Registered organization
- Connection type (high-traffic, such as data center or mobile network versus low-traffic, such as residential broadband)
- Proxy status as binary yes or no
As an ethical point, it will be important to be able to explain how any conclusions are reached, and the inaccuracy or imprecisions inherent in pulling IP information. While this was not a major concern for the patrollers we talked to, if we are to create a tool that will be used to provide justifications for administrative action, we should be careful to make it clear what the limitations of our tools are.
30 October 2020
We have updated the FAQ with more questions that have been asked on the talk page. The Wikimedia Foundation Legal department added a statement on request to the talk page discussion, and we have added it here on the main page too. On the talk page, we have tried to explain roughly how we think about giving the vandal fighters access to the data they need without them having to be CheckUsers or admins.
15 October 2020
This page had become largely out of date and we decided to rewrite parts of it to reflect where we are in the process. This is what it used to look like. We’ve updated it with the latest info on the tools we’re working on, research, fleshed out motivations and added a couple of things to the FAQ. Especially relevant are probably our work on the IP info feature, the new CheckUser tool which is now live on four wikis and our research into the best way to handle IP identification: let us know what you need, the potential problems you see and if a combination of IP and a cookie could be useful for your workflows.
Statement from the Wikimedia Foundation Legal department
This statement from the Wikimedia Foundation Legal department was written on request for the talk page and comes from that context. For visibility, we wanted you to be able to read it here too.
Hello All. This is a note from the Legal Affairs team. First, we’d like to thank everyone for their thoughtful comments. Please understand that sometimes, as lawyers, we can’t publicly share all of the details of our thinking; but we read your comments and perspectives, and they’re very helpful for us in advising the Foundation.
On some occasions, we need to keep specifics of our work or our advice to the organization confidential, due to the rules of legal ethics and legal privilege that control how lawyers must handle information about the work they do. We realize that our inability to spell out precisely what we’re thinking and why we might or might not do something can be frustrating in some instances, including this one. Although we can’t always disclose the details, we can confirm that our overall goals are to do the best we can to protect the projects and the communities at the same time as we ensure that the Foundation follows applicable law.
Within the Legal Affairs team, the privacy group focuses on ensuring that the Foundation-hosted sites and our data collection and handling practices are in line with relevant law, with our own privacy-related policies, and with our privacy values. We believe that individual privacy for contributors and readers is necessary to enable the creation, sharing, and consumption of free knowledge worldwide. As part of that work, we look first at applicable law, further informed by a mosaic of user questions, concerns, and requests, public policy concerns, organizational policies, and industry best practices to help steer privacy-related work at the Foundation. We take these inputs, and we design a legal strategy for the Foundation that guides our approach to privacy and related issues. In this particular case, careful consideration of these factors has led us to this effort to mask IPs of non-logged-in editors from exposure to all visitors to the Wikimedia projects. We can’t spell out the precise details of our deliberations, or the internal discussions and analyses that lay behind this decision, for the reasons discussed above regarding legal ethics and privilege.
We want to emphasize that the specifics of how we do this are flexible; we are looking for the best way to achieve this goal in line with supporting community needs. There are several potential options on the table, and we want to make sure that we find the implementation in partnership with you. We realize that you may have more questions, and we want to be clear upfront that in this dialogue we may not be able to answer the ones that have legal aspects. Thank you to everyone who has taken the time to consider this work and provide your opinions, concerns, and ideas.
П: Ці будуць удзельнікі з дадатковымі паўнамоцтвамі (спраўджвальнікі, адміны, ст'юарды) надалей мець доступ да IP-адрасоў пасьля рэалізацыі гэтага праекту?
- А: Мы пакуль ня маем адназначнага адказу на гэтае пытаньне. У ідэале IP-адрасы мусяць паказвацца як найменшай колькасьці людзей (уключаючы супрацоўнікаў ФВМ). Мы спадзяемся абмежаваць паказ IP-адрасоў толькі тым удзельнікам, якім гэта патрэбна.
П: Як антывандальныя інструмэнты працавацьмуць без IP-адрасоў?
- А: Маем некаторыя патэнцыйныя ідэі па дасягненьні гэтай мэты. Па-першае, мы можам выводзіць супрацоўнікам замест IP іншую належную інфармацыю пра ўдзельніка, якая прадаставіць тую ж колькасьць зьвестак. Да таго ж можна аўтаматычна правяраць, ці прывязаныя да аднаго IP два розныя ўліковыя запісы нават без раскрыцьця IP — у выпадках выяўленьня лялькаводаў. Магчыма таксама, што антывандальныя інструмэнты будуць надалей карыстацца IP-адрасамі, але мецьмуць абмежаваны доступ. Каб выпрацаваць аптымальнае рашэньне, нам трэба шчыльна супрацоўнічаць з супольнасьцю.
П: Калі мы ня будзем бачыць IP-адрасы, дык што будзе выводзіцца пры рэдагаваньні незарэгістраванымі ўдзельнікамі?
- А: Замест IP-адрасоў удзельнікі ўбачаць унікальнае, аўтаматычна згенэраванае, чалавекачытэльнае імя ўдзельніка. Гэта можа выглядаць, напрыклад, як «Ананім 12345».
П: Ці будзе новае імя ўдзельніка генэравацца для кожнага незарэгістраванага рэдагаваньня?
- А: Не. Мы плянуем увасобіць мэтад генэраваньня прынамсі часткова пастаянных імёнаў удзельнікаў, напрыклад, па прывязцы іх кукамі, IP-адрасе ўдзельніка ці па абодвух прызнаках.
П: Ці будуць выдаленыя ў межах гэтага праекту ўжо існыя IP-адрасы?
- А: Ніякія існыя ў гісторыі, журналах падзеяў ці подпісах IP-адрасы ў гэтым праекце ня будуць схаваныя. Зьмены закрануць толькі будучыя рэдагаваньні, зробленыя пасьля запуску праекту.
П: Ці ёсьць гэты праект вынікам унясеньня пэўных зьменаў у законах?
- А: Не. Стандарты прыватнасьці зьвестак разьвіваюцца ў мностве краінаў і рэгіёнаў па ўсім сьвеце разам з чаканьнямі ўдзельнікаў. Мы заўсёды цяжка працавалі над захаваньнем прыватнасьці ўдзельнікаў і працягнем вывучэньне і прымяненьне найлепшых практык, заснаваных на новых стандартах ды чаканьнях. Гэты праект — наступны крок у нашым уласным разьвіцьці.
П: Якія тэрміны дадзенага праекту?
- А: Як сказана вышэй, мы ня будзем рабіць ніякіх пэўных прагнозаў што да праекту, пакуль не зьбярэм меркаваньні супольнасьці. Мы б хацелі вызначыць адчувальныя першыя крокі, якімі варта найперш заняцца камандзе распрацоўнікаў, каб мы маглі пачаць працу над доўгім праектам, аднак не прысьпешваем выкананьне пэўнага графіку.
П: Як я магу паўдзельнічаць?
- А: Мы радыя выслухаць вашыя ідэі ці водгукі аб праекце! Асабліва хочам даведацца пра якія-небудзь працэсы, на якія можа паўплываць гэты праект. Свае думкі можаце скідаць на старонку абмеркаваньня ці запоўніць гэтую форму, і мы з вамі зьвяжамся. Некаторыя з нас будуць на Вікіманіі, дзе мы таксама хацелі б з вамі сустрэцца.
Q: Why is this proposal so unclear?
- A: It’s not really a proposal and shouldn’t have been described it as such. We don’t have a solution, but are trying to work out the best solutions with the communities. It might be helpful to understand this as a technical investigation trying to figure out how IP masking could work.
Q: Why don’t you just turn off the ability to edit without registering an account?
- A: Unregistered editing works differently across different Wikimedia wikis. For example, Swedish Wikipedia has discussed unregistered editing in the light of this investigation and decided they still want unregistered editing. Japanese Wikipedia has a far higher percentage of IP editing than English Wikipedia, but the revert rate of those edits are only a third – 9.5% compared to 27.4% – indicating that they are also far more useful. We think that deciding for all wikis that they can’t have IP editing is a destructive solution. The research done on IP editing also indicates IP editing is important for editor recruitment.
Q: Who will have access to IPs of unregistered users now?
- A: We are not going to leave this burden to e.g. the CheckUsers and the stewards alone. We will have a new user right or the ability to opt in to see the IP if you fulfill certain requirements. Others could potentially see partial IP addresses. We are still talking to the communities about how this could best work.
Q: Has this been decided?
- A: Yes. The Wikimedia Foundation’s Legal department has stated that this is necessary. As the entity legally responsible for protecting the privacy of Wikimedia users, the Wikimedia Foundation has accepted this advice and is now working to find the best way to implement this while supporting and listening to the user communities. Some Wikimedians will be unhappy with, this, but legal decisions like these have not been a matter of community consensus. What the communities can be part of deciding is how we do this. That very much needs to be defined with the Wikimedia communities.
Q. Will masks be global, for all Wikimedia wikis, or local, for one wiki?
- A: Global. A masked IP will look the same across all Wikimedia wikis.
Q: Will all unregistered users be unblocked when this happens? If not you could track the information in the logs.
- A: No. This would wreak havoc on the wikis. This solution will have to be a compromise. We have to balance the privacy of our unregistered editors with our ability to protect the wikis.