Grants talk:Project/University of Virginia/Machine learning to predict Wikimedia user blocks

Comments[edit]

Anyone may comment here. Blue Rasberry (talk) 15:32, 1 December 2018 (UTC)[reply]

Eligibility confirmed, round 2 2018[edit]

This Project Grants proposal is under review!

We've confirmed your proposal is eligible for round 2 2018 review. Please feel free to ask questions and make changes to this proposal as discussions continue during the community comments period, through January 2, 2019.

The Project Grant committee's formal review for round 2 2018 will occur January 3-January 28, 2019. Grantees will be announced March 1, 2018. See the schedule for more details.

Questions? Contact us.

--I JethroBT (WMF) (talk) 16:16, 6 December 2018 (UTC)[reply]

Comments/Questions[edit]

Thank you for this interesting project. However I have a few questions/comments:

It is not directly specified but as understand that the project only applies to English Wikipedia? You are not going to include other English speaking projects?
The project lacks a timeline and a detailed budget. I think that you should add them.
It is not clear from the project if you are going to create some kind of a tool that can predict the probability of a user to be blocked based on the past blocks? Such a tool may be an interesting result of the project.
Have you considered extending you project (or writing another project) to global locks? Currently checking user contributions across multiple projects is time consuming and having a tool that can automatically score global contributions of an account can be a very useful.

Ruslik (talk) 20:20, 23 December 2018 (UTC)[reply]

@Ruslik0:

The 3 researchers all speak Hindi. It is at the front of my mind to find some way to do some test or bring some outcome into Hindi language Wikimedia projects. Some aspects of machine learning insights are language agnostic, but also to some extent branching into another language sacrifices time spent in getting deeper insights. The best that I can promise here is a published description of how the team chose whether to try to another language, whenever that decision comes. English is the first choice because English Wikipedia has the most consistent style of applying blocks. The basis of this research is assuming a large number of consistent blocks since 2004, then using the circumstances around those blocks to predict need in other situations. English Wikipedia has the best dataset for this.
I cannot add much of a budget. This kind of research at this level of universities in the United States costs US$15-30,000 (we bill this to commercial clients or subsidize it with other funding) and I am asking for $5000. I have trouble detailing everything that happens because the budget for this project is intermingled with about 15 other projects which all share an entire department. The accounting department here does not prioritize $5000 nonprofit partnerships like this one I am trying with the Wikimedia grants team, so I am exploring options here on my own. It is not customary for any school to have nonprofit partners in programs like this, and for example, I do not see other machine learning research projects in the Wikimedia space or at the Internet Archive or even easy to find in library sciences. These projects happen, but the extra expenses here are trying to do more reporting into Wikimedia projects and also everyone going out of their way to learn and comply more with Wikimedia community culture. It is a challenge for me to put items on a line of how to budget to address these things, as any hourly predictions I made to turn labor into an expense would be a guess within the research budget for an entire institute's graduate degree program. Here are some odd things which do not exist:
1. Examples of research commitments - wmf:Open access policy does not really fit university research nor is Research:New project commonly used
2. Community discussion around how to match Research:Index and Grants:Start. The WMF grants process is sort of designed to fund projects which would not happen otherwise, and not wiki-fy projects which will happen but could be adapted to Wikimedia community benefit
3. Toolserver still requires that researchers personally message a WMF developer to get access to data. Research:Quarry is for small-data queries but any data scientist is going to have outpaced needs on day 1.
4. These researchers are focused on data science research and they happen to be looking at Wikimedia data. This is the usual case; there are a few hundred research papers like this with perhaps none ever bringing results back to Wikimedia projects. Every time I pull attention back to Wikimedia projects it is a research pull away from the primary focus of research, and right now, Wikimedia projects neither have the community nor administrative base to accept research feedback from teams which are not fluent in Wikimedia community culture and conversation.
Yes, this project will create the tool which you are describing. Yes, anyone could run this tool. Yes, the project will produce documentation on how to operate the tool and use it. Some complications are that early results show that misconduct trends on English Wikipedia change every few months, so what has worked in any year may not work for future years. Probably we can predict who should have been blocked 1-10 years ago, but maybe those accounts are stale now or anything else could have happened. Another problem is that even if we can predict who needs a block now it is not socially acceptable to give this data to anyone. Algorithms are not neutral and the outcomes here have a bias to punish some behavior while letting other behavior continue. Having extreme justice in some cases while allowing other problems to continue could cause chaos far outside the scope of this project to manage. A more likely outcome is that this research will need to happen 100 times at 100 universities in the next 10 years, and somehow the research gets aggregated into a tool which the Wikimedia community gets piece by piece over the years in controlled pilots.
Yes we are already talking about extending this project. Perhaps we can do more research next year at our university, and perhaps globally in other languages other universities can do similar research. Data science research requires a large dataset and the Wikimedia dataset might become a canonical research dataset because it is large, free and open, in every language, nonprofit, and broadly interesting to students everywhere. This project started with English Wikipedia because the data is most clean there. Data across Wikimedia projects requires some adaptation, as activity in two different Wikimedia projects (or languages) is likely to vary more than activity within a Wikimedia project.
I get your point- you want a tool to quickly check if a Wikimedia account which is disruptive in one project is also disruptive across projects and languages. Yes, eventually, the kind of analysis which this project is doing will be able to make that kind of check. What is harder for me to predict is when and how this technological development will happen. For this project there is me the Wikimedian, 3 graduate students, 1 faculty adviser, and some technicians to help with hardware needs, coding, and math. In ~9 months we will have a report on English Wikipedia and I am not sure what else we can promise because this problem is mostly unexplored. Online Wikimedia tool development is outside the skill set here, but as en:Wikipedia:Wikimedia Cloud Services develops, maybe something will be possible in the future. Not now, and I cannot predict when!

Thanks for the questions. Ask if you have others. Blue Rasberry (talk) 16:57, 28 December 2018 (UTC)[reply]

Comment enWP is a particular beast, and an atypical, non-representative beast, so I don't see that the proposal name accurately represents the proposal itself. If it is enWP focused, then please name it so, and present it so.
An example with the focus on enWP, and how that limits some of the tools that are in use within the broader WMF wikis. Numbers of wikis have abuse filters that enable an automated block (timed or indefinite), so that aspect of where we already have some machine application in play is being missed. Similarly, I see no evidence that the abuse filters, and their hits are going to play a part in the analysis; often the AF play a role in determining an admin's action.
So some clarity in the proposal in whether it is enWP, or it is indeed Wikimedia? Is the project aimed at editing of real users, or it is aimed at spambots? Is it an assessment of user blocks, or IP blocks, or IP range blocks? Which misconduct are you looking to identify? In short, I want to see a better scoping of the question, it seems vague to me at this time, and it impacts what rights they may need. [Not that I am against the proposal, all knowledge is useful.] — billinghurst sDrewth 00:42, 3 January 2019 (UTC)[reply]

MediaWiki database schema

@Billinghurst: I am neither able to commit to only English Wikipedia nor promise to stay away from other Wikimedia projects.

You mentioned below the blocking codes. To some extent these are consistent but as you guess, often they are not. To some extent codes like this are language and Wikimedia project agnostic, so examination of some block rationales in one language and project apply elsewhere. Perhaps "copyright violation" is a fairly consistent concept,for example.

The only sure characteristic of blocked accounts is that they have a flag marking them as blocked. Many blocked accounts also have a block rationale. Among those, sometimes we can categorize the rationale. Among the ones we can categorize sometimes they apply across projects. To the extent that we can look across languages and projects we would like to do so. Definitely this research project is already considering how to branch out, and also how to determine when branching out is too expensive to attempt.

We are still at the beginning of data analysis here and basic information to orient a researcher - like how to interpret the database schema shown here - is still lacking. We have some documentation at mw:Manual:Database layout and in about 10 other places, but ideally, our documentation would be organized so that any researcher could quickly begin an analysis without finding our organization to be too peculiar. The reality is that the database schema is not entirely intuitive, nor does the documentation explain the odd parts, nor have we yet established a mature culture of online discussion of these things. This research project and probably any other similar projects for the next few years are still early explorers in this space. The existing documentation is super helpful and took a lot of work to develop. It is a great resource, but in the same way that new users cannot quickly understand everything about Wikimedia projects, even less so is interpreting the database easy. Besides the database structure itself the content in each language and project will vary, so there is a significant cost in staging the data to compare across those barriers.

If you really insist, this could be called an English Wikipedia project, but then later if we have further insights about other projects we include those. This is common enough, as many researchers already call the entirety of Wikimedia projects as "Wikipedia".

I cannot say whether this research will consider timed blocks, abuse filters, or any other single variable. The best way that I can describe this project's focus is to say that it will seek to collect whatever seems like a large and relevant dataset, then try to correlate that with other collected datasets. The way that Wikimedia misconduct research gets "completed" is to eventually characterize as many relevant datasets as can be analyzed with the available computational resources.

There seems to be an arms race happening where accounts that to me look individually human seem collectively like bot accounts for following patterns of good conduct followed by bad or weird conduct. This is all still early research so I do not know how to interpret what is happening. The answer about whether this research is for human or spam accounts does not really apply, as this research is only identifying patterns of misconduct.

You ask which misconduct this research is identifying. The method here is machine learning, so it seeks to cluster groups of accounts. It learns by looking at the characteristics of accounts which have already been blocked, then tries to find similar accounts with similar characteristics but which are not blocked. Yes, this looks at registered users and IP addresses. We are not currently clustering IP ranges.

I could say more, but what are your thoughts to this point? I could either explain in a more narrow way or more broadly discuss the research direction. What would be useful? Blue Rasberry (talk) 15:19, 3 January 2019 (UTC)[reply]

In what form the algorithm will present the results? Will it be just a binary choice - an account should be blocked or not blocked? Will it specify the reasons for this particular results and identify problematic edits? Ruslik (talk) 08:49, 20 January 2019 (UTC)[reply]

@Ruslik0: It is customary in this field for the near-surface backend to distill the results into a percent rank, like "80% likely vandalism". It is further customary to send the results to the end user in an even more simplified form, like "green-good yellow-maybe red-check".

There probably will not be reasons because artificial intelligence does not think that way. From the computer's perspective, the reason is "I judge this situation to be similar to 50 other situations" but the computer does not actually know any social context or what the situations are in human terms. We might try to apply some reasons. "Use of a proxy" is probably a clear case to identify; rudeness is probably unclear.

Something else weird that we are already finding is what seems potentially malicious accounts that they make small good edits for months or years then do misconduct. This is still speculation, but it seems like there might be bad actors grooming bot accounts to pretend to be human and make small occasional edits, then when they want to have a human take over to do vandalism, they do misconduct edits on this previously good account. Machine learning might detect this; I am not aware of any humans collecting examples of these cases.

I could say more and might even offer more. We are late in this review game. If there is a concession I can make to get your support and signature in the endorsement section then I am open to what you might ask. Blue Rasberry (talk) 14:54, 21 January 2019 (UTC)[reply]

Support this proposal[edit]

Given the very low dollar value being requested, and the potential for learning in the proposed plan, I support this proposal. My only concern would be that some of the predictive quality of blocks that are issued on English Wikipedia will depend on non-public information (in particular, IP addresses and user agents) that are otherwise only available to checkusers/stewards/selected WMF staff; this is often an important element in identifying socks. It is possible that there may be a way around this - I do seem to recall some sort of script on Enwiki that some admins use(d) for making range blocks, but I could be mistaken since I never used it myself. I will be interested in learning the result. Risker (talk) 22:27, 2 January 2019 (UTC)[reply]

@Risker: Guessing that those blocks made by users w/ CU rights can be identified for the time that the blocking admin was a CU. One hopes that the codes that enWP uses have been correctly identified for most of the time. I would think that the analysis of range blocks themselves would be an interesting subpoint we ask to be evaluated. — billinghurst sDrewth 23:23, 2 January 2019 (UTC)[reply]

@Billinghurst and Risker: Thanks for the support and comments.

About access to private information

I have no access to non-public information and this project will not seek access to non-public information. This project will only consider information which is available to anyone. The available information is a large rich dataset and the cost of managing private information is excessive for the planned research here.

Yes, it is correct, the research would be more complete with access to private information. However, analysis of misconduct on Wikimedia projects is not something that will be accomplished, and instead is a perpetual research direction with 1000s of interesting small questions which all sorts of researchers - including many school research projects globally - can seek to address. In the future at any time anyone can replicate the research done here with the private information if necessary, because once the research question is scoped and modeled then asking more specific questions with additional data is much easier.

Wikimedia public information is an extraordinarily rich dataset. In the Google/Jigsaw paper describing their analysis of Wikimedia misconduct, they considered a few variables when obviously they could have analyzed 100 good ones or a 1000 other relevant ones. It is the same for this project and anyone else considering misconduct. When researchers do request data, we should first make sure they are oriented to what is already publicly available, which already is more than Google can handle for the research they are doing right now.

While I want Wikimedia private information to be available to researchers, I also think that we should be cautious in how we provide access to this. It is neither likely nor improbable that within the next few years much private Wikimedia information will leak out to become accessible. I think that our security is fine now, but it needs to improve continually, and we should not lightly pass around sensitive data. Blue Rasberry (talk) 14:49, 3 January 2019 (UTC)[reply]

Aggregated feedback from the committee for University of Virginia/Machine learning to predict Wikimedia user blocks[edit]

Scoring rubric	Score
(A) Impact potential Does it have the potential to increase gender diversity in Wikimedia projects, either in terms of content, contributors, or both? Does it have the potential for online impact? Can it be sustained, scaled, or adapted elsewhere after the grant ends?	6.8
(B) Community engagement Does it have a specific target community and plan to engage it often? Does it have community support?	6.4
(C) Ability to execute Can the scope be accomplished in the proposed timeframe? Is the budget realistic/efficient ? Do the participants have the necessary skills/experience?	5.8
(D) Measures of success Are there both quantitative and qualitative measures of success? Are they realistic? Can they be measured?	4.6
Additional comments from the Committee: The project fits with Wikimedia's strategic priorities and has a significant potential for online impact. The results can be scaled or adapted elsewhere. Sustainability is not an issue in this project. I love the spirit of this project, and while the investment is rather low, I am concerned about what a sustainable outcome from this project might be. I understand the solution is to better understand blocks and where blocks might be appropriate. Who would be administering and proposing the solution to the community based on the research findings? One thing we need to do better at as a whole is using research findings and applying them in our practices. I'd love to better understand how you hope to accomplish this. Builds on a substantial work done by Jigsaw et al. It is less clear how much the people involved here have done in this area. The project is rather innovative in its use of AI to identify problematic accounts. There are risks but there are significant possible benefits for relatively small sum of money - $5,000. Measurement is not what this project aims to impact. The impact is for community health. I am not concerned about measurable outcomes, but what I do want to know more about is how the proposers will help the students develop a community engagement plan with the findings to ensure there is impact on community health/practices surrounding community health. The project doesn't really talk about community engagement, nor does it look like it really needs it since it is just going to process the existing corpus of interactions. Th scope can be accomplished in the requested 9 months. There is no formal budget. The participants are students with little Wikimedia experience but they have an experienced adviser (Wikimedian in Residence). I like the gusto in this proposal, but I worry the timeline is too short. It's a big undertaking and I would hate to see something of this importance cut short. The community engagement is low but it is not necessary for this project. It may support diversity. For this proposal, I am not concerned about community engagement during the research phase. Once the findings are established, the proposers have a plan on disseminating the information. I do hope to see more about engagement on the back end though so to help the community apply the learnings from the research findings. When I think of having $100,000 and this group is just asking $5000 to do assist in something that I want more information on, I say go for it. I am wiling l to support this small research effort but I think they still have to produce some kind of a budget. I think this is a strong investment, but I do want to stress to the proposer to think about the timeline. It's a bit ambitious and a highly important project. Perhaps there are constraints due to students' time or graduation, etc. I just hope their timeline would not impact the extension of the project if needed. I also want to see more about how the team will engage with the community about applying the research findings to practices.

This proposal has been recommended for due diligence review.

The Project Grants Committee has conducted a preliminary assessment of your proposal and recommended it for due diligence review. This means that a majority of the committee reviewers favorably assessed this proposal and have requested further investigation by Wikimedia Foundation staff.

Next steps:

Aggregated committee comments from the committee are posted above. Note that these comments may vary, or even contradict each other, since they reflect the conclusions of multiple individual committee members who independently reviewed this proposal. We recommend that you review all the feedback and post any responses, clarifications or questions on this talk page.
Following due diligence review, a final funding decision will be announced on March 1st, 2019.

Questions? Contact us.

I JethroBT (WMF) (talk) 19:42, 6 February 2019 (UTC)[reply]

Round 2 2018 decision[edit]

Congratulations! Your proposal has been selected for a Project Grant.

The committee has recommended this proposal and WMF has approved funding for the full amount of your request, 5,000 USD

Comments regarding this decision:
The committee is pleased to support this research examining the causes of blocks and exploring how machine learning techniques can help predict what behavior likely necessitates a block. The committee has asked that funding be contingent on the following conditions:

That the project scope be focused on behavior and blocking rationales that machine learning is likely to be able to capture well. For example, many behaviors around harassment may be difficult to capture using these techniques.
That once the analysis is completed, that specific recommendations be made on English Wikipedia regarding improvements to blocking policy or best practices in blocks based on your findings.

Next steps:

You will be contacted to sign a grant agreement and setup a monthly check-in schedule.
Review the information for grantees.
Use the new buttons on your original proposal to create your project pages.
Start work on your project!

Upcoming changes to Wikimedia Foundation Grants

Over the last year, the Wikimedia Foundation has been undergoing a community consultation process to launch a new grants strategy. Our proposed programs are posted on Meta here: Grants Strategy Relaunch 2020-2021. If you have suggestions about how we can improve our programs in the future, you can find information about how to give feedback here: Get involved. We are also currently seeking candidates to serve on regional grants committees and we'd appreciate it if you could help us spread the word to strong candidates--you can find out more here. We will launch our new programs in July 2021. If you are interested in submitting future proposals for funding, stay tuned to learn more about our future programs.

I JethroBT (WMF) (talk) 15:01, 1 March 2019 (UTC)[reply]