Grants:Project/Data Science Institute/Machine learning to predict wiki misconduct

From Meta, a Wikimedia project coordination wiki


statustest
Machine learning to predict wiki misconduct
summaryplease add a 1-2 sentence summary
targetEnglish Wikipedia
amount5000
nonprofityes
granteeBluerasberry
contact• rasberry@virginia.edu• Claudia Scholz, cws3v(_AT_)virginia.edu
organization• Data Science Institute at the University of Virginia
this project needs...
volunteer
grantee
advisor
join
endorse
created on18:53, 16 November 2018 (UTC)


Project idea[edit]

What is the problem you're trying to solve?[edit]

What problem are you trying to solve by doing this project? This problem should be small enough that you expect it to be completely or mostly resolved by the end of this project. Remember to review the tutorial for tips on how to answer this question.

Wikimedia community members who participate in a civil manner and Wikimedia's audience of readers sometimes experience disruption from user misconduct. User misconduct is a problem with hundreds of causes and expressions, and requires many different types of interventions to address.

Over time misconduct in Wikimedia projects is increasing, including with more people engaging with Wikimedia projects in general and also more sophisticated automation in misconduct. While eventually the Wikimedia community will have to address the field of misconduct systematically, right now there is space for anyone with any innovative ideas to try next-generation technological interventions, report results, and classify the field and the effectiveness of early experiments.

What is your solution to this problem?[edit]

For the problem you identified in the previous section, briefly describe your how you would like to address this problem. We recognize that there are many ways to solve a problem. We’d like to understand why you chose this particular solution, and why you think it is worth pursuing. Remember to review the tutorial for tips on how to answer this question.


Project goals[edit]

What are your goals for this project? Your goals should describe the top two or three benefits that will come out of your project. These should be benefits to the Wikimedia projects or Wikimedia communities. They should not be benefits to you individually. Remember to review the tutorial for tips on how to answer this question.

This project seeks to apply machine learning (artificial intelligence) tools to Wikimedia projects to classify the characteristics of blocks in English Wikipedia and then use the insights from those classifications to rank the likelihood that other accounts, not blocked, actually merit a block.

As a guess, perhaps there are 10 distinct categories of misconduct (vandalism, testing, spam, rudeness, etc.), 10 machine learning strategies which one could apply to create a system of prediction, and about 1000 Wikimedia projects in various languages (English Wikipedia, Spanish Wikisource, etc) which each need solutions to address misconduct. Potentially this is 10*10*1000 research questions, and while we cannot examine all of these, It will not be the case that someone first develops a universal bot which predicts misconduct in all cases, but rather, many teams of researchers will need to do short projects in machine learning to examine the issues from various perspectives. Over time there will be some strategies which work well generally, but for now, we are still at a point where research teams can use whatever large datasets are most easily available and try to develop any system which seems likely to make the most high quality predictions to benefit the most users.


Project impact[edit]

How will you know if you have met your goals?[edit]

For each of your goals, we’d like you to answer the following questions:

  1. During your project, what will you do to achieve this goal? (These are your outputs.)
  2. Once your project is over, how will it continue to positively impact the Wikimedia community or projects? (These are your outcomes.)

For each of your answers, think about how you will capture this information. Will you capture it with a survey? With a story? Will you measure it with a number? Remember, if you plan to measure a number, you will need to set a numeric target in your proposal (i.e. 45 people, 10 articles, 100 scanned documents). Remember to review the tutorial for tips on how to answer this question.


Do you have any goals around participation or content?[edit]

Are any of your goals related to increasing participation within the Wikimedia movement, or increasing/improving the content on Wikimedia projects? If so, we ask that you look through these three metrics, and include any that are relevant to your project. Please set a numeric target against the metrics, if applicable.

This project will not directly relate to the three metrics. Indirectly, the hope is that the development of more automated tools to promote civility will indirectly make Wikipedia a friendlier space for users to do the activities which those metrics measure.

Project plan[edit]

Activities[edit]

Tell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project? How will you follow-up with people that are involved with your project?


Budget[edit]

How you will use the funds you are requesting? List bullet points for each expense. (You can create a table later if needed.) Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!


Community engagement[edit]

How will you let others in your community know about your project? Why are you targeting a specific audience? How will you engage the community you’re aiming to serve at various points during your project? Community input and participation helps make projects successful.


Get involved[edit]

Participants[edit]

Please use this section to tell us more about who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.

Community notification[edit]

You are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc.--> Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?

Endorsements[edit]

Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).