Kode Etik Universal/Konsultasi 2021/Penelitian
Untuk menyokong pekerjaan Komite Perumus Fase 2 Kode Etik Universal, Yayasan Wikimedia mengadakan sebuah proyek penelitian yang berfokus pada gangguan keamanan di proyek Wikimedia. Penelitian ini dilangsungkan melalui survei dan beberapa wawancara yang mendalam.
Survei dibagikan kepada organisasi mitra Wikimedia dan berfokus pada persepsi responden dan pengetahuan serta keterlibatan dalam sistem pelaporan dan penegakan yang sudah ada. Wawancara diadakan dengan anggota komunitas yang sebelumnya pernah menghubungi tim Trust & Safety untuk urusan gangguan keamanan, dan berfokus pada pengalaman mereka sebagai sasaran gangguan keamanan yang serius dan berkepanjangan.
Peneliti menemukan bahwa terdapat beberapa halangan kunci yang menjadi rintangan atas keterlibatan komunitas dalam sistem ini: antara lain, seperti sistem pelaporan yang membingungkan dan ketakutan atas reaksi balasan dari publik. Namun begitu, sentimen secara umum tetap positif, dan anggota komunitas masih tetap ingin terlibat dalam sistem penegakan komunitas serta WMF, meskipun dihalangi oleh rintangan-rintangan serupa ini.
One major research hurdle is the fact that the unstructured reporting system within the Wikimedia movement makes it hard to collect metrics on reporting. Additionally, we wanted to capture more data on sentiments related to our enforcement systems rather than information on frequency and type of reports.
For this research, we decided on a split approach. We conducted a survey, primarily distributed to affiliates, as well as targeted semi-structured interviews drawing from community members who had previously approached Trust & Safety for issues related to harassment.
The survey, titled “Wikimedia Community Reporting System Survey”, was 37 questions long and available in English, Spanish and Hindi. Prior to deployment, the survey was pre-vetted by volunteers from the Wiki LGBT+ affiliate group. The survey was hosted on Qualtrics and administered using an anonymous link to collect as little identifying information as possible. It was open for four weeks, running from April 15 - May 7, 2021.
The survey was sent out primarily via email to a number of affiliates, focusing on groups for LGBT+ editors and a number of women’s groups across the movement. Invitational emails were also sent to the arbitration committees of English, French, Russian and German Wikipedias. Notices for this survey were also posted on the village pumps (or equivalent) on Italian, Spanish, French, German, Polish and Arabic Wikipedias.
Rather than focusing on harassment in and of itself, this survey was designed to focus on our community’s perceptions of existing enforcement systems. Specifically, it targeted a few key themes:
- Understanding of enforcement systems: do people know ways to report incidents on our projects? Do they know how to use these systems? How did they learn to use these systems initially?
- Engagement with enforcement systems: do community members routinely use these systems? How common is it to use these systems at all?
- Perception of enforcement systems: are these systems generally well regarded? Do people think it is worthwhile to engage with them?
- Privacy and transparency: what portions of the system do people generally believe should be accessible to the public? What information about the whole enforcement apparatus should be available to the public?
In total, the survey received 85 responses. Of these, 68 (80%) were fully completed and 17 (20%) were partial completions. 53 (62%) of survey respondents took it in English, while 31 (36%) took it in Spanish. Only one respondent took the survey in Hindi.
Most of our respondents were located in Europe. A majority identified as women, and the most common age range was between 30 and 44 years old. About a quarter also identified themselves as being part of the LGBT+ community.
Nearly all of our respondents have spent over a year on Wikimedia projects, with about a third reporting over a decade of experience with our projects. 39% of respondents currently or previously held administrator rights, while slightly over half - 55% - had served as an organizer for Wikimedia events or groups before. Most respondents, at 73%, report being active on one to three projects.
Although the survey provided an option for under-18s to report their age, we did not collect information from self-reported minors and selecting this option fast-forwarded them to the end of the survey
Compared to the 2018 Community Engagement Insights report, this study's respondents had a much higher proportion of women, similar median age, and similar geographic location. Based on the Wiki comparison dataset, for our top 100 wikis by size rank, median monthly active administrators make up about 7% of the total median monthly active editors (9 monthly active admins and 129 monthly active editors). This is an expected result due to our method of primarily recruiting participants from affiliates, ArbComs, and community members experienced enough to find Village Pumps or similar pages.
To complement the data from the survey, we conducted four semi-structured interviews. All participants had previously contacted Trust and Safety in the past. Of our six initial invitations, four total interviews were scheduled. One was conducted with a Wikimedia Foundation staff member, although the focus of the interview was on their experiences as a volunteer prior to starting work as a Foundation staff member.
Because of our small sample size and non-representative respondent body, this research should be understood as a pilot study and is not broadly generalizable. We know that our survey respondents are generally more experienced and have spent more time on Wikimedia projects than the general community, and that we have far more administrators than in the general community. However, the conclusions raised here are important for some key reasons:
- These are questions that have not been answered by previous research, or have not previously been asked at all.
- This research provides a useful initial challenge to assumptions about how enforcement systems are perceived.
- These results highlight which areas may be best to prioritize for future research efforts.
- Since we know our respondents are, on average, more experienced, we may infer that the general population is less familiar with reporting and enforcement systems and adjust our assumptions to match.
Overwhelmingly, participants report that our existing enforcement systems are overly complicated and difficult to understand.
Write-in survey responses noted the existence of loopholes, unclear redirections, the expectation that one may be asked to make reports to the very people one wished to report, and an utter lack of clear instructions on how to report. One of our interviewees noted that it took three years and a chance in-person meeting with a trustee before they even knew about any formal reporting channels. Other frustrations expressed by participants included the fact that only a sliver of problematic behaviour can be reported under their community’s rules; for example, insults directed at a minority group rather than a specific individual are hard to report.
“A disaster. We must endure and be silent.”
The current reporting system opens reporters up for reprisal, backlash, or undue public scrutiny.
Many of our write-in responses specifically named fear of reprisal as a major negative in our current system. Some of them used a specific jargon term, “boomerang”, to refer to this phenomenon, suggesting that this is so common as to warrant a special name for it.
Administrators and Arbitration Committees are also aware of, and suffer due to this flaw. One interviewee pointed out that the “half-transparent” cases (those where public evidence of harassment was supplemented with private evidence) handled by the Arbitration Committee were especially draining to handle. They described how onlookers would speculate on these private details and proceed to scrutinize or even harass the supposed reporters, and ArbCom members, on the basis of this speculation.
“There is no assurance that the community will handle the problem in a respectful manner towards the person who has suffered the aggression, making the reporting process intimidating.”
A slight majority of our survey respondents have never made a report.
54% of survey respondents have never made a report. This includes 40% of respondents who have, or had, held administrator rights.
Six in ten respondents have purposefully chosen not to report incidents.
Reasons given include a fear of backlash or reprisal, belief that the outcome would be ineffective, and the process of making reports being too confusing or difficult. Write-in answers also indicated that occasionally, the people in charge of receiving reports are the very people that are the subject of complaints.
Two-thirds of non-administrator survey respondents are unsure or do not know how to report problematic behaviour.
By contrast, current or former administrator respondents were far more confident in their knowledge of how to report - 83% reported that they did know how to report such behaviour.
“You have to be an expert to know how to use [the reporting system], and if you have that much knowledge you are already an admin.”
This difference exists even among survey respondents who had made reports in the past. 80% of current or former admins who had made reports before report understanding the process. Only 31% of non-admins who had made reports in the past report that they understand the process.
Among research participants, there is a general desire for a private on-wiki reporting channel.
When survey respondents were asked what venues should be available for reports, the most common option chosen was “other private route”; the third most common choice was “on a separate private channel, on-wiki”.
It takes too long to resolve cases of harassment.
This was expressed by users making reports as well as the administrators expected to handle them. Reporters note how time-consuming the process of reporting is, as this generally includes having to learn about available reporting options, learning the appropriate report structure, and assembling the necessary supporting evidence for the report. Administrators point to the lack of training to handle interpersonal conflict, the complexity of cases that are severe enough to spur reports, and a general lack of capacity due to dwindling active administrator numbers.
All of these factors combine to make report resolution a very lengthy process, which itself becomes yet another factor that discourages community members from reporting.
Interconnected communities with disconnected enforcement allows community members with a history of harassment to continue such actions and evade consequences.
Without prompting, we routinely heard from respondents about certain communities with a bad reputation for being especially combative or hostile. What they had in common was a lack of guidelines around behavior or reporting and a general “blind eye” attitude towards their community members’ histories of rule-breaking behavior, especially if paired with a long history of contribution.
In at least one case, Wikimedia Commons, multiple interviewees pointed out that users with long histories of abuse, to the point of being banned from other Wikimedia projects, were allowed to engage in similar behaviors on Commons.
Participants were divided as to what the precise role of the Wikimedia Foundation should be in enforcement systems.
While there is broad consensus within the Wikimedia community that the Foundation should be responsible for certain cases involving minors or credible threats of violence, this consensus breaks down when it comes to most other matters.
Survey respondents alternately decried the Foundation’s involvement while also viewing it as a needed route that bypasses local reporting systems that are being handled by the people they wish to report. Others wanted the Foundation to act as a “backup” option if there were no global administrators, oversighters or stewards available. Still others were upset that the burden of handling harassment reports, especially while organizing Wikimedia events, was shifted to volunteers rather than the Foundation.
Survey respondents wanted access to aggregate statistics and case summaries, not necessarily full case details.
Our current systems provide full public visibility of all cases made on-wiki. However, when asked what information the general public should see with regards to reporting on Wikimedia projects, more respondents chose aggregate statistics and summaries over full case details. This was true of both administrators and non-administrators.
Despite all of this, respondents generally still view it as worthwhile to make a report.
While users were much more likely to view the entire enforcement process as ambiguously useful at best, survey respondents were still generally positive about local admins, the WMF, and event organizers’ likelihood of addressing reports. Slightly over half of survey respondents said that it was “definitely” or “probably” worthwhile to make a report. Two of our interviewees also noted that, even though they knew (or believed) that the people they reported to were powerless to act on their reports, they still wanted to make them. This suggests that the act of reporting is itself an action that people wish to perform, regardless of outcome.
Based on these key findings, and supported by the conclusions of previous research on harassment and reporting on Wikimedia projects, we suggest these recommendations for the Universal Code of Conduct phase 2 drafting committee.
Provide an anonymous or private on-wiki reporting system.
The outsized fear of reprisal for reporting makes it all the more astonishing that slightly over half of the survey respondents still believe the system is worthwhile. In order not to corrode this trust, on which the entire system relies, we need to provide a way for reporters to privately report incidents.
Whether this is done anonymously or privately (that is, limiting who can see the identity of the reporter), it should be our absolute priority to provide either a technical solution or a policy one to accommodate this clear and pressing need.
Clarify and streamline the reporting process, for both reporters and administrators.
Our mix of different reporting systems for local admin, global admin, and Foundation-handled events is deeply confusing.
We should provide a means for community members, especially newcomers, to clearly find the appropriate channel and reporting body for the incidents they wish to report. This is especially important for cases of harassment since being the target of sustained harassment already makes it difficult to seek out help in a timely manner.
Clarity, in this sense, means clarifying several factors:
- Recipient of reports: who will receive and address it.
- Pathways for reporting: which is best suited to a specific situation, and where to go to do so.
- Necessary information in a report: how to provide the necessary information to make a report high-quality that allows administrators to act on it/
- Visibility of the report: who will be able to see said report.
- Process of enforcement: how are judgements reached and how are these judgements enforced.[note 1]
Make it easier to surface incidents of harassment to administrators.
Participants in this research indicate that it is extremely difficult to find where to report, figure out who should receive the report, and finally learn how to structure the report appropriately.
This severely limits the ease of reporting, which may limit opportunities to de-escalate disagreements before they become much harder to address. It may also help make reporting a less stressful action overall and improve rates of engagement, which is necessary in a system that relies on community goodwill and trust in local administrators to function.
Provide more flexible and varied outcomes for reporting.
Currently, the outcomes of reports tend to be limited to no administrator action, or some level of escalating restriction on editing. While the Foundation has tried to provide more granular options for administrators to restrict editing, we should look into opportunities to broaden the outcomes of reports.
This could include allowing reporters to have input on the outcome of reports, providing ways for subjects of reports to apologize or make amends, and other non-block dependent outcomes. Alternatively, this could involve greater inter-project coordination to place sanctions on an editor’s behavior across a wider number of projects, or a pathway to escalate reports to other authorities outside of local administrators.
Make the reporting process transparent and not just visible.
Our existing local reporting systems are by and large fully publicly visible. Nevertheless, this does not mean that they are transparent.
This is a barrier for would-be reporters, who can not only see how complex their reports are expected to be, but also shows them evidence of past backlash and reprisal against other reporters.
For administrators, our completely unstructured reporting systems make it difficult to find archives of reports with the same subject, especially if this happens across projects. It also makes it difficult to address reports as they vary wildly in quality.
Lastly, observers have no access to useful statistics since the current system’s unstructured nature makes it impossible to gather accurate or reliable metrics on reporting, and these reports’ heavy use of jargon makes them hard for observers or laypeople to understand.
Provide better guidelines or specific training for administrators to resolve disagreement while avoiding escalation into full-blown harassment.
Our interviews, and prior research on the topic, point to a link between disagreements over content and escalation into harassment or abuse. However, as prior research has indicated, we have few mechanisms to turn past administrative actions into actual guidelines or precedent for future incidents.
One interviewee also noted that their years-long experience of harassment actually started with a minor disagreement over categorization, and part of how this harassment intensified came about when other users were brought in supposedly to provide “consensus” on the categorization disagreement. Another interviewee’s experience of harassment started with an editor unilaterally bringing up a procedural concern using spurious evidence and drawing in other editors to provide their opinions.
In a healthy community, we could expect such procedural or technical concerns to be separate from the possibility of harassment. To this end, we should make an effort to allow administrators or other trusted community members to diffuse tensions and resolve disagreements, while avoiding aggressive behavior in the process.
This study raises a few avenues of research that may be worth further exploration.
Do these findings extend towards the general population of the Wikimedia projects? Are there any significant differences based on size?
While our targeted survey and interviews provided important perspectives for the work of the committee, it would be useful to know if these findings hold true for the general population. Since many of the concerns raised in this research seem linked to the size and capacity of the local administrators, it would be worthwhile to see if these issues exist across wikis regardless of size or if they change accordingly.
What is the state of our cross-wiki or global reporting and enforcement systems?
This research focused largely on the experiences of community members on a handful of wikis, with issues of harassment generally also limited to those same spaces. We did hear about incidents of off-wiki or cross-wiki harassment, but they were not the focus of this research. Therefore, this seems like a logical expansion for this line of research.
How do our existing conflict resolution systems address disagreement? To what extent do they facilitate or mitigate escalation into harassment?
This line of research would look at potential structural issues regarding our consensus-driven policy process, and the ways in which low-level disagreements are usually treated. If the underlying issues driving harassment are structural rather than incidental - that is, if the ways in which users are encouraged to interact makes it easier or more likely for people to engage in aggressive ways - tackling misconduct will require a very different approach than if the issue lies with a population of people choosing to act aggressively.
Do private reporting systems impact rates of reporting? Does this privacy impact rates of enforcement?
While the author of this report would argue for a moral imperative to provide private reporting pathways, it would also be prudent to investigate how implementing such a system might impact reporters, subjects of reports and administrators on Wikimedia projects.
Is the current state of the reporting system truly transparent? Who currently makes use of publicly available reporting information?
Publicly visible reports are the current standard of our reporting system, and this is usually justified in the name of transparency. However, we have conducted minimal research into whether or not such visibility actually means the system itself is transparent, nor what the community means by “transparency”. We should investigate how visibility and transparency relate in terms of Wikimedia reporting systems.
A second key question is to figure out who this current state of visibility best serves. As this Targets of Harassment project indicates, would-be reporters are badly served by this publicly visible system. Therefore, we ought to figure out who benefits, if anyone, from this system, and how they make use of this publicly visible information.
How are appeals currently handled and is this system placing a realistic ask on people seeking appeals?
While we have conducted research on many aspects of enforcement and reporting on Wikimedia projects, we have yet to pay close attention to our unban and appeals process. Investigating these parts of our enforcement process would also be a logical extension of research into enforcement overall.
- ↑ n.b. There are strong reasons to want to obscure the details of how administrator actions function. In this case, clarity should not override operational security concerns. Clarity may be achieved by providing an outline of actions rather than full details.
- ↑ In particular, refer to Reaching the Zone of Online Collaboration.
For a full bibliography of prior research on harassment and reporting on Wikimedia projects, directed by the WMF, please see below. Unlinked documents marked “internal report” are currently limited to WMF staff and contractors, and may be available on individual request. All linked documents are in English.
Lee, Han A., and Crupi, Joseph. Reaching the Zone of Online Collaboration: Recommendations on the Development of Anti-Harassment Tools and Behavioral Dispute Resolution Systems for Wikimedia. Harvard Negotiation and Mediation Clinical Program, 12 Dec. 2017, p. 50.
Lo, Claudia. Reporting System Rubrics: A Comparison of Peer-Dependent Reporting Systems. Wikimedia Foundation, Feb. 2019, p. 62.
Lo, Claudia “Take It to AN/I”: Summary of Existing Research about Reporting Systems. Internal report, Wikimedia Foundation, Nov. 2018, p. 12.
Poore, Sydney. AN/I Survey Summary. Internal report, Wikimedia Foundation, 22 Mar. 2018.
“Wikipedia:Community Health Initiative on English Wikipedia/Administrator Confidence Survey/Results.” Wikipedia, 28 Nov. 2017.
Raish, Michael. Admin Confidence Survey 2019 Preliminary Results. Internal report, Wikimedia Foundation, 20 Aug. 2019, p. 50.
Raish, Michael. Identifying and Classifying Harassment in Arabic Wikipedia: A “Netnography.” Internal report, Wikimedia Foundation, 21 Dec. 2018, p. 23.
Support & Safety Team. Harassment Survey 2015 Results Report.
“User Reporting System/Wikimania 2018 Notes - Meta.” Meta-Wiki, 20 July 2018.