Research:On the collaboration with Wikimedia Communities in the context of building Machine Learning Systems

From Meta, a Wikimedia project coordination wiki
08:27, 5 March 2021 (UTC)
Duration:  2020-November – 2021-November

This page documents a proposed research project.
Information may be incomplete and may change before the project starts.


As the title suggests, this projects focuses on the human aspect of machine learning at the Wikimedia Foundation. Over the past decade, ML has grown in importance and applications ubiquitously. The human aspect of ML although, has been focused on very rarely as algorithms and techniques have taken the limelight. The work that the foundation has been doing in building community driven ML systems is certainly a unique example of effective methods to use collaboration in the ML context. This study aims to carefully analyze the complete process of requesting, building, deploying, using and maintaining ML systems at the foundation and allow the wider research community to benefit from this approach.

Sketching out the process at a high level:

  1. The need for a model is identified
    1. This step of the process can be extrinsic or intrinsic to the community.
    2. Either the Scoring Platform team or the Wiki community in question approaches the other with a proposal to build a new ML model for the wiki.
  2. Agreeing on the requirements (between the team and the community)
    1. The team and the community discuss (usually async to maintain public record and participation) what exactly is required by the model.
    2. This discussion is usually non-technical and subjective to be inclusive and to meet the community where they stand.
  3. Moving forward to collect data
    1. Data was placed at top priority in the traditional ML process that was followed.
    2. There were many reasons for this. Primary being, we wanted to capture what the community wanted to capture. This meant building high-quality datasets that represented the information and context of the community's existing work
  4. Building a model with the collected data
    1. Technical aspect of the process. Not sure if this is interesting to the context I am writing this in.
    2. Use formal evaluation to check if model *might* be useful
  5. Iterating on the model with the community
    1. The model is hosted in production or WMFlabs to allow stakeholders to experiment and give the team feedback on various aspects

You are more than welcome to add to the process mentioned above, in case we have missed anything out, by contacting the investigators via email or IRC.


Sign your username below to be pinged about new information, questions, and requests related to this project.



Describe in this section the methods you'll be using to conduct your research. If the project involves recruiting Wikimedia/Wikipedia editors for a survey or interview, please describe the suggested recruitment method and the size of the sample. Please include links to consent forms, survey/interview questions and user-interface mock-ups.



Please provide in this section a short timeline with the main milestones and deliverables (if any) for this project.

Policy, Ethics, and Human Subjects Research[edit]


It's very important that researchers do not disrupt Wikipedians' work. Please add to this section any consideration relevant to ethical implications of your project or references to Wikimedia policies, if applicable. If your study has been approved by an ethical committee or an institutional review board (IRB), please quote the corresponding reference and date of approval.



Once your study completes, describe the results and their implications here. Don't forget to make status=complete above when you are done.