Research:Automated classification of edit types/Taxonomy

From Meta, a Wikimedia project coordination wiki

This page documents a complete and inclusive taxonomy. The goal is to capture all potential change types that describe editing activity on Wikipedia. A practical subset will be used for the automated classification system, but we leave the identification of this practical subset to other discussion.

Syntactic[edit]

These classes describe "what" was done during an edit. (As opposed to "why")

Mechanical operations[edit]

These types of changes can be detected with simple regular expressions

  • wiki links
    • insert/delete
    • modify
      • disambiguate
  • inter-wiki links
    • insert/modify/delete
  • external links
    • insert/modify/delete
  • category
    • insert/modify/delete
  • headers
    • insert/modify/delete
  • table
    • insert/modify/delete
  • image
    • insert/modify/delete
  • references
    • insert/modify/delete
  • content move / refactor
  • redirect
  • cleanup
    • punctuation
      • insert/delete
    • whitespace
      • insert/delete
    • formatting -- css/style/bold/italics

Abstract/probabilistic operations[edit]

These classes can't be detected trivially with regular expressions. They would require some machine prediction.

  • Grammar (word-level)
    • punctuation, whitespace
    • spelling error, typo
    • capitalization
    • tense change
  • Rephrase (word-level)
    • synonym
    • remove redundant words
  • Sentence (sentence-level)
    • insert/modify/delete (substantive)

Semantic[edit]

These classes describe "why" an edit was made. They usually amount to subjective applications of policy.

  • NPOV
  • Vandalism
  • Notable?
  • External link policy
  • Manual of style
  • New topic (article creation)

Complex operations[edit]

These classes describe changes that are part of a multi-edit operation

  • Merge
  • Archiving

Discussion[edit]

These classes describe actions relevant to a discussion.

  • New topic
  • Reply
  • !Vote (Support/oppose)
  • Comment signing
  • Suggestion
  • WP tagging/assessment