Community Tech/Data Portability

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

This page documents a project the Wikimedia Foundation's Community Tech team has worked on or declined in the past. Technical work on this project is complete.
We invite you to join the discussion on the talk page.

The Wikimedia Foundation 2018-19 Annual Plan includes the goal Management of Personal Data, which lists the sub-goal “Data Portability." Part of that sub-goal, "Wiki account data download," requires the provision of "an accessible and easy-to-use" way for users to "download a copy of their wiki account a structured, machine-readable format." At the end of 2018, that project was assigned to the Community Tech team. This page tracked our progress toward that objective.

With the augmentation and documentation of the Userinfo API, this project is now complete. You can learn about the API and get examples showing how to use it on API:Userinfo.


Work on 'Data Portability' project is complete (April 15, 2019)[edit]

The team has augmented the Userinfo API and it can be used now to get a machine-readable download of the user information listed below. The Userinfo API help page has been updated, with new examples to show you how to use the tool to download data.

  • User ID #
  • Username
  • Email address (if we have it)
  • Email verification date (if we have it)
  • Account registration date
  • Timestamp of latest edit (on current wiki)
  • User rights conferred and/or user group memberships (on current wiki)
  • Preferences (on current wiki, including hidden preferences)

With this note, active development of the Data Portability project is complete. Please let us know if you see problems with the API. If you need access to user information not listed above for a user on a Wikimedia project, please email privacy(_AT_) for further assistance.

Work has started—on an API (not a download): March 26, 2019[edit]

Work has started on the main ticket for this project, Allow users to access their user data via an API. If you’ve been following our progress, you may notice that that title, which is new, signals a change in tactics (the old title was “Give users a download of their user data").

The team has decided that being able to access an API feed, instead of downloading a stand-alone JSON file, will potentially be more useful for people interested in this. So that’s the new plan. As always, please let us know if you have thoughts about this.

We’re dropping plans for a contributions download: Nov. 28, 2018[edit]

I’d previously said that as part of this project we would be providing users with a download of their wiki contributions. But the Foundation has decided that such a download is not required, since we already provide users with reasonable access to that data via the Contributions page. Moreover, providing such a download would be a bigger job than we’d thought initially, especially for users with a long contribution history. (Because their files could be quite large, we’d have to defer generation of the file, processing the request in the background, and then create a user interface both for notifying the user that the file was ready and then for downloading the file....)

For this reason, the Annual Plan no longer requires a contributions download, and I’ve closed the ticket that described the contributions part of the project.

Content of the ‘Contribution Data’ and ‘User Data’ reports: Nov. 7, 2018[edit]

Below is a list of the contribution and user data we’re currently looking to provide as downloads. Please have a look and let us know if you have thoughts or questions about these proposals.

As we work through the possibilities of what information our databases can provide speedily and at scale, the lists below will probably change. For example, in the “User Data” report, doubts have been expressed about our ability to round up logged actions performed on the user—including blocks and user group changes.

An email address will be provided by which users will be able to request data not included in these reports, on a case-by-case basis. The reports will be made available from both the Contributions page and the "User profile" tab of Preferences. All data will be provided in a machine-readable format.

Contribution Data

  • All edits and logged actions the user performed. This will include page edits, page creations, page moves, page deletions, thanks, patrol actions, page protections, etc.

This report will not include deleted/suppressed edits or deleted summaries.

User Data

  • User ID #
  • Username
  • Email address (if we have it)
  • Email verification date (if we have it)
  • Account registration date
  • Date of first edit
  • Date of latest edit
  • User groups joined
  • Global user groups joined (e.g. global interface editor)
  • Wikis that the user has an account on [Not absolutely required, but the idea is to tell them this as an alternative to actually going and fetching all the global data]
  • Preferences (this wiki, including hidden preferences)
  • User group changes (including comments unless suppressed)
  • Number of times the user was  blocked; info about the blocks (including comments unless suppressed)