Grants talk:IdeaLab/Controversy Monitoring Engine

From Meta, a Wikimedia project coordination wiki

More, please.[edit]

Hi Radfordj, I love this idea, and I'd love to know more! How will determine "words"? Also, you may want to take a look at https://meta.wikimedia.org/wiki/Grants:IdeaLab/Gender-gap_admin_training and consider how you might collaborate/support. --Mssemantics (talk) 21:01, 18 March 2015 (UTC)[reply]

@Mssemantics: I think the words would probably come from an initial hand curated list of the typical four-letter variety: the b-word, c-word, n-word, f-words, etc. (I don't say them here so this post never gets flagged :). But, once we're able to identify posts that contain intimidating language, an inductive list of words weighted by the number of times they appear in intimidating posts will probably be used. See these word clouds (language warning!) for nice examples of the words that occur around intimidating language.
As for admin training (which I love by the way), I think this project can fold into it in two ways. The first is by wrapping in some kind of first-response training. I don't know what admins will get into with this monitoring. The monitor might pick up flame wars that spread out over weeks or it might pick up wars that emerge and die out in a matter of minutes. I don't know. Maybe using this engine would help the training in getting admins to think about how gap-conscious interventions can happen in a live-controversy context. The other connection could be in training admins on how the controversy monitor works. There's an underlying theory of controversy that the algorithm will be built on and which admins can contribute to or maybe learn more about controversies based on how the algorithm works. I'm sure there are other synergies too. Did you have some in mind? --Radfordj (talk) 12:32, 19 March 2015 (UTC)[reply]

Eligibility confirmed, Inspire Campaign[edit]

This Inspire Grant proposal is under review!

We've confirmed your proposal is eligible for the Inspire Campaign review. Please feel free to ask questions and make changes to this proposal as discussions continue during this community comments period.

The committee's formal review begins on ’’6 April 2015’’, and grants will be announced at the end of April. See the schedule for more details.

Questions? Contact us at grants(at)wikimedia.org.

pointers and comments[edit]

hi there, I really like the idea, I would just like to learn more about certain parts of it (and have some pointers for you).

some comments:

  • regarding visualizations you could take a look at:

two distinct approaches how to show conflict. Maybe it helps or is even reusable.

  • regarding your estimates, I would think that 40 h is too little time to develop and test a robust method to identify harmful controversy (as there is of course also good controversy apart from simple vandalism fighting) . Especially how you actually want to test this does not become quite clear in the proposal and I think it would be good to expand on that. I think that is an important point if you would want the tool to be widely used later on, as a) it can be quite subjective what an intimidating behavior might be and b) a low precision or recall might render the tool not effective enough for actual use by Wikipedians (esp. precision, i.e. raising false alarms).
  • another point relating to the last one: just showing how controversial an article is might not directly relate to how intimidating some editors are towards, e.g., women and newcomers. For a controversy to "heat up" to a degree where it produces a high conflict score, it requires at least two parties going back and forth on each other. Especially newcomers (and maybe women also) might be less inclined to even go into an escalation like this, so the big "wars" would actually not be the ones you would like to detect, but rather instances where user tried a couple of times to change something, but were unilaterally reverted (without fighting back very much themselves). Or maybe this is already included in your concept (I was not sure)?

just some thoughts.

--Fabian Flöck (talk) 19:43, 11 April 2015 (UTC)[reply]

Thanks for so many references and good questions. I've been looking at Contropedia and whoVIS along with Wiki War Monitor. These tools have been great examples for me to think through the technical and measurement decisions going in to designing and implementing this engine.
What's been the most productive question for me to ask myself is where controversy monitoring fits in with Wikipedians' everyday editing. I think both WhoVIS and Contropedia do a great job of laying out the histories of edit conflict in a way that's intuitively navigable. The issue I have, even for my own idea for a page modeled on stats.wikipedia, is in envisioning how Wikipedians would find these external pages and refer to them on some regular basis. Either Wikipedians would have to know they're there and be able to find the kind of pages they're interested in or a new community of Wikipedians (like the New Pages Patrol) would spring up around it. I think either vision could work but would take a lot of time to create.
Because of this, my thinking has evolved more towards implementing bot-constructed boxes like the "Current Status of Articles" box on Project Pages (see the [Feminism Project Page] for example). In this framework, the pages in a project page would be scraped and evaluated for controversy and then scored. The box might then link to the article itself or to a more in-depth analytic engine like WhoVIS for further analysis.
Finally, to your questions about use, positive and negative controversy, and intimidation versus controversy. My hope with this is that the engine brings more eyes and ears to a controversy ("sunlight is the best disinfectant"). The premise is not that controversy is positive or negative, but that the actions people take when they disagree are destructive or constructive. Controversy can lead to harassment, intimidation, and a range of other negative behaviors and I think the more people are watching the better-behaved people will be. In this same vein, this is not an incivility detection engine meant to automate the classification of individuals or specific actions as intimidation or harassment or to characterize particular pages or groups as hostile or uncivil. I'm working on this civility question in my research, but implementing and publishing incivility scores or labels on individual editors, communities, or pages seems to me to be policing by code. But, then again, maybe there's a way to clearly define intimidation in a way that, in automatically labeling or scoring someone as such, it's taken as legitimate. --Radfordj (talk) 12:08, 29 April 2015 (UTC)[reply]
I forgot your question about validating the scale. The controversy calculation would begin with the Wiki War formula, which I take to be useful at face value (the aggregate list of controversial pages in their paper look like what you'd think is controversial). The first round of testing is meant to properly scale the score. The Wiki War controversy scores were long-tailed, meaning most articles will get crammed in the low-controversy end of the scale. In this step, I'd run the controversy scoring on different subsamples of articles to examine the distribution of controversiality for modest sample sizes. After that, I'd add several other features like reversion and cross-talk on the discussion and user pages and maybe some other features. But, for the most part, I imagine the Wiki War score and the Number of Disagreement Actions from WhoVIS/Contropedia should go the furthest in adding new information to the scale.
Having said that, I'll be continuing to work on controversy scoring (and other things) moving forward. This grant project is really meant to build the infrastructure for the initial code and for presenting controversiality to Wikipedians. Changes will continue to happen after the grant is over. The grant is just covering a minimally viable implementation. --Radfordj (talk) 12:29, 29 April 2015 (UTC)[reply]
Thanks a lot for your answers and sorry that it didn't work out with the grant this time. However, the idea is certainly worth pursuing. --faflo (talk) 17:19, 14 May 2015 (UTC)[reply]

Aggregated feedback from the committee for Controversy Monitoring Engine[edit]

Scoring rubric Score
(A) Impact potential
  • Does it have the potential to increase gender diversity in Wikimedia projects, either in terms of content, contributors, or both?
  • Does it have the potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
6.9
(B) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
5.0
(C) Ability to execute
  • Can the scope be accomplished in the proposed timeframe?
  • Is the budget realistic/efficient ?
  • Do the participants have the necessary skills/experience?
7.2
(D) Measures of success
  • Are there both quantitative and qualitative measures of success?
  • Are they realistic?
  • Can they be measured?
6.3
Additional comments from the Committee:
  • I don't see a large impact on Wikipedia, though the tool will be a new input for an English WikiProject (unclear how this tool could be adapted to other languages)
  • Not 100% confident that the hypothesis regarding women avoiding conflict is correct, though it could be a useful tool generally
  • Project seeks to increase gender diversity both in content and contributors by preventing editing wars and intimidation. Most potential for online impact would be not just as a standalone tool but used in conjunction with admin training and efforts to raise awareness about the gender gap (& other forms of bias and discrimination leading to aggressive discussions).
  • There does not seem to be a strong gender component to the proposal and that is the largest weakness. Would need community support to be useful.
  • This project seems like it could have both high potential positive impact and high potential negative impact, depending on details of implementation.
  • Has a few endorsements, but not much community support. Would like to see more community engagement.
  • Not clear who would be the target to work with the data that is accumulated beyond interested administrators who voluntarily look at the data and follow up. Right now there is a pretty big backlog on the noticeboards on Wikipedia English with many problems not being adequately addressed. This tool could add to the burden of monitoring for problematic situations without providing a clear method to resolve the conflicts. Not yet clear if the support for use of the tool would be adequate to improve the situation with controversies on articles.
  • Would like to see a project like this integrated well with WMF’s Research team.
  • The skills to develop the tool could be found, the question is whether there is good support to execute the use of the tool.
  • I find the budget to be quite reasonable ($15/hr). I believe Radfordj is more than capable, especially after reviewing his work with the Lazer Lab.
  • It seems like the organizer has the skills/experience to execute this proposal, but it might not hurt to have at least one other volunteer involved, particularly someone whose role is that of a non-coder.
  • Could have better quantitative measures (number of users etc.) but overall on the right track. The metric proposed in the first, second and third stages are clear but not measurable (the possible answer are yes or no), in the last stage the report could have a more data or number, but the project needs to have more numbers, like "relation of number of c-words detected in two days"
  • Would like to see measures of success for use of the tool and a specific idea of the impact it could have on the gender gap.
  • An intriguing and potentially useful idea. Especially like the focus on getting editors to use the monitoring information for intervening in and de-escalating conflicts. How will that be done?
  • Unlike the Edit Filter which administrators and other members of the community use because it give a solution, this tool has the potential to document the issue, but I'm not clear it addresses it in a complete enough fashion to have an impact on controversies in the community, and even less likely to address the gender gap.
  • There are some potential pitfalls in implementation. In particular, the words the tool is used to track as a measure of "controversy." Seem like it could easily become a tool with which to bludgeon editors who aren't part of the majority demographics of Wikipedia. Revising the proposal to account for those potential problems at this stage would make it stronger and more likely to be of substantial benefit.

Inspire funding decision[edit]

This project has not been selected for an Inspire Grant at this time.

We love that you took the chance to creatively improve the Wikimedia movement. The committee has reviewed this proposal and not recommended it for funding, but we hope you'll continue to engage in the program. Please drop by the IdeaLab to share and refine future ideas!

Comments regarding this decision:
Thanks for engaging in the Inspire campaign! We’d love to see you return in a future round of Individual Engagement Grants if you’ve got other ideas, or with this idea if you are able to incorporate feedback to address some of the suggestions and concerns from the committee’s review.

Next steps:

  1. Review the feedback provided on your proposal and to ask for any clarifications you need using this talk page.
  2. Visit the IdeaLab to continue developing this idea and share any new ideas you may have.
  3. To reapply with this project in the future, please make updates based on the feedback provided in this round before resubmitting it for review in a new round.
  4. Check the Individual Engagement Grant schedule for the next open call to submit proposals or the Project and Event Grant pages if your idea is to support expenses for offline events - we look forward to helping you apply for a grant in the future.
Questions? Contact us at grants(_AT_)wikimedia.org