Research:Characterize and Model Wikihounding

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
23:06, 16 September 2017 (UTC)
Duration:  2017-September — ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.

Wikihounding is an type of harassment that can be described as stalking behavior that spans different topics and namespaces, has a component of bullying behavior, is not about the topic, but about people: someone following another contributor irrespective of topic.

This project aims to characterize and model this kind of behavior, given tools to the community to understand better and deal with this problem.


We are going to study user interactions, analyzing massive datasets, using techniques such as:

More specifically, we propose to follow two lines of research one focus on text analysis, using NLP techniques, and a content agonist approach, analyzing user interactions in hypergraphmodel.

Content Agnostic Approach[edit]

Edit wars in Wikipedia has been largely studied. An edit war is usually consider to be the consequence of different opinions about an specific topic, between two or more users. Naturally, users with different political views might have different opinions on many articles related with politics, and these differences can scale in a multi-article edit war. These actions can be consider toxic, but is not necessarily a stalking behavior. However, in the case (if exist) that edit wars start happening across multiple topics, this can be an indicator of a person-centered attack (instead of topic-centered), that might be categorized as wikihounding.

Taking the advantage that edit wars can be detected in content agnostic approach (without analyzing the text), we propose to study the topical span of those wars, characterizing usual and unusual (potentially toxic) behaviors.

The main tasks to develop such model are:

  • Generate a representative dataset of edit war in Wikipedia.
  • Detect pairs or groups of users involved in more than X (define X part of the study) controversies.
  • Define and implement a robust topic model for articles, suitable for this study.
    • Define a distance metric for topics (eg: Geography is N steps far from Politics, and M steps far from Sports, is N > M or not)
  • Apply an outlier detection mechanism to find potential cases of wikihounding.

NLP Approach[edit]



Q1, Q2