Research:Characterize and Model Wikihounding
Wikihounding is an type of harassment that can be described as stalking behavior that spans different topics and namespaces, has a component of bullying behavior, is not about the topic, but about people: someone following another contributor irrespective of topic.
This project aims to characterize and model this kind of behavior, given tools to the community to understand better and deal with this problem.
We are going to study user interactions, analyzing massive datasets, using techniques such as:
Content Agnostic Approach
Edit wars in Wikipedia has been largely studied. An edit war is usually consider to be the consequence of different opinions about an specific topic, between two or more users. Naturally, users with different political views might have different opinions on many articles related with politics, and these differences can scale in a multi-article edit war. These actions can be consider toxic, but is not necessarily a stalking behavior. However, in the case (if exist) that edit wars start happening across multiple topics, this can be an indicator of a person-centered attack (instead of topic-centered), that might be categorized as wikihounding.
Taking the advantage that edit wars can be detected in content agnostic approach (without analyzing the text), we propose to study the topical span of those wars, characterizing usual and unusual (potentially toxic) behaviors.
The main tasks to develop such model are:
- Generate a representative dataset of edit war in Wikipedia.
- Detect pairs or groups of users involved in more than X (define X part of the study) controversies.
- Define and implement a robust topic model for articles, suitable for this study.
- Define a distance metric for topics (eg: Geography is N steps far from Politics, and M steps far from Sports, is N > M or not)
- Apply an outlier detection mechanism to find potential cases of wikihounding.