Research:Wikipedia Edit Types
This project seeks to reboot past work on automated classification of edit diffs -- namely Halfaker and Taraborelli -- to identify a basic taxonomy of edit types and set of language-agnostic detectors for each edit type such that they can be used to analyze edits on Wikipedia.
Edit Diffs and Detectors
The initial phase of the project focused on the technical implementation of processing Wikipedia diffs and mapping changes to basic edit types. The resulting Python library (mwedittypes) can identify insertions, removals, changes, and moves to the following types of nodes: tables, references, lists, formatting, categories, media, wikilinks, external links, templates, headings, comments, whitespace, punctuation, words (or characters), sentences, paragraphs, and sections.
The second phase of the project focuses on taking the core edit types and mapping them to higher-order categories of edits. For instance, this might be identifying combinations of edit types that differentiate between edits that generate content versus those that maintain or annotate existing content.
The mwedittypes library can be used for a wide variety of different use-cases, some of which are mentioned below: