Jump to content

Research:Wikimedia Research Best Practices Around Privacy Whitepaper

From Meta, a Wikimedia project coordination wiki
Tracked in Phabricator:
Task T337883
Created
16:35, 17 January 2024 (UTC)
Collaborators
Michael Zimmer
Duration:  2023-October – 2025-October
This page documents a completed research project.


The goal and anticipated output of this project is a whitepaper that creates shared understanding about how researchers should conduct their research on or about the Wikimedia projects in light of how Wikimedia communities value Privacy.

The latest version of the white paper is publicly available as an OSF Preprint at https://doi.org/10.31219/osf.io/uyxnf_v2.

Overview

[edit]

Requested by English Wikipedia's Arbitration Committee, this white paper is intended to convey "[...] to researchers the principles of our movement and give specific recommendation for researchers on how to study and write about Wikipedians and their personal information in a way that respects our principles."

Executive summary

[edit]

Readers access Wikipedia articles more than 15 billion times every month, but the usage of Wikipedia does not stop here. Researchers frequently use Wikipedia related data for developing models, insights, and as part of research and development workflows. On average, every year researchers use or refer to Wikipedia in more than 130,000 articles and publish a minimum of roughly 500 articles about Wikipedia itself.[1] The amount and diversity of the usage of Wikipedia in research projects has resulted in significant insights and improvements in Wikipedia itself as well as in other aspects of our lives (e.g., through machine translation). However, conducting research using Wikipedia has its own challenges, both for researchers, Wikipedia community members, and Wikimedia Foundation. In this paper we focus on one of the most frequent topics we observe the Wikipedia community and researchers having to grapple with: privacy. Our aim with this work is to help researchers and Wikipedia contributors see the challenges each group faces in their work. We further offer recommendations about what to pay attention to and how to navigate some of the questions we expect Wikipedia contributors and researchers face when interfacing with or conducting research projects.

Read the latest version of the privacy white paper

[edit]

The latest version of the white paper is publicly available as an OSF Preprint at https://doi.org/10.31219/osf.io/uyxnf_v1.

How to provide feedback on the latest version of the privacy white paper (until 31 August 1 July 2025)

[edit]

We invite researchers and Wikipedians to review and provide feedback and comments until 31 August 1 July 2025.

  • We encourage you to provide your feedback in the corresponding talk page. If you prefer to share it privately, you can do so by sending an email to research-feedback@wikimedia.org with “privacy white paper” in the subject line. You can also sign up for office hours with Eli Asikin-Garmager to share feedback.
  • We’re interested to know in what ways the paper has informed your work and/or involvement on Wikipedia as an editor. We’d also like to understand what gaps may exist in terms of topics or questions you have that the paper may not adequately address.
  • Please add new topics or comment on existing topics at the talk page to help us keep feedback organized.
  • When providing feedback, please include reference to section numbers in the paper to help us understand which section or contents you’re providing feedback on.
  • We will be monitoring the talk page until 31 August 1 July 2025, but won't be able to respond directly to comments. However, all comments will be reviewed and considered when we revisit updates to the white paper after 31 August 1 July.
  • If you are more comfortable leaving comments in a language other than English, please feel welcome to do so. Please note that we may utilize machine translation in reviewing non-English content.

Updates

[edit]

November 2025 In mid-October, the white paper was included and presented as part of a panel at the 2025 Association of Internet Researchers Conference, and on 2 November, we shared about it via a Diff post.

October 2025 We shared the updated whitepaper as a preprint in OSF.

July 2025 We'd like to thank everyone who has provided feedback so far on the white paper. We'll be delivering a lightning talk at Wikimania 2025 (accepted proposal) in early August to help socialize the white paper open feedback period and solicit additional feedback from more community members. To accommodate additional feedback we hope to receive from Wikimania audience members, we've moved to extend the feedback period from 1 July to 31 August. The feedback process will be the same; please refer to "How to provide feedback".

March 2025 We have opted to make the privacy white paper available via the Center for Open Science Preprints service. As noted in above sections, the paper is publicly available at https://doi.org/10.31219/osf.io/uyxnf_v1. Please see directions above for how to provide feedback.

January 2025 We have completed a number of iterations and improvements. We have also worked in coordination with the English Wikipedia Arbitration Committee to review the most recent draft, and have made some additional revisions based on that feedback. At this point we assess that the paper is safe to try; namely, that the paper is at a good place for it to be shared publicly and start encouraging the Wikipedia researchers and community members to use it and let us know what concretely we may need to consider improving to make it more actionable for them. As such, at the moment, we are in the process of publishing this paper on arxiv.org so that it can receive DOI and be referenced in research publications. The link will be provided as soon as we receive it. (Please see March 2025 update)

[OUTDATED] Initial draft notes and process

[edit]

How to provide feedback and comments

[edit]

We're gathering feedback on the Research Ethics Privacy White Paper until 30 April 2024. We encourage you to provide your feedback in the corresponding talk page. If you prefer to share it privately, you can do so by sending an email to research-feedback@wikimedia.org with "privacy white paper" in the subject line.

  • We encourage you to use the talk page/discussion feature to provide your input. (Please don't directly edit the draft.)
  • As we are still drafting and revising (hence some notes you see throughout the draft), the most helpful feedback would be content-oriented in nature (since things are still in progress, copy-editing and feedback of that nature is less helpful for the moment).
  • Please add new topics or comment on existing topics to help us keep feedback organized.
  • The talk page also includes a few prompts for specific groups that we're hoping to receive feedback on.
  • We will be monitoring the talk page until 30 April 2024, but won't be able to respond directly to comments. However, all comments will be reviewed and considered in the ongoing drafting and revising process.
  • If you are more comfortable leaving comments in a language other than English, please feel welcome to do so. Please note that we may utilize machine translation in reviewing non-English content.
  • Join us for a Conversation Hour on 23 April 2024 at 15:00 UTC. This conversation will be guided by some questions to encourage actionable feedback. Join via Google Meet.

Initial Working outline

[edit]

Having gone through a feedback process for the outline with the original requesters, English Wikipedia's Arbitration Committee, we have been drafting the white paper based on the following outline, considered stable as of February 2024.

  • Introduction: What is the problem, why is it important, what has been tried before, and what is the goal of the white paper?
  • Related work: A review of related work, including privacy risks and adaptation on Wikipedia, ethical judgments for researchers, naming/referencing research participants, and existing related Wikipedia policies and guidelines.
  • Exploring key questions: Understanding key values of Wikipedians, policies around doxxing, understanding parameters of variation for different language versions of Wikipedia, understanding researchers, among other topics.
  • Recommendations: Recommendations for researchers and Wikipedians.

Notes

[edit]
  1. As measured by the number of search results in https://scholar.google.com/ when searching for articles with the word “Wikipedia” in their title.