Research:Sentiment analysis tool of new editor interaction

WSoR 2011

Contact

Whym

University of Tokyo

Wikimedia Foundation

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

This sprint aims at developing an automatic classifier of messages found in new editor interactions. Previous studies provide us a set of coded examples of those messages, categorized into "praise", "criticism", "warning", etc. Based on the coded examples, I apply supervised machine learning methods to build the classifier. If successfully built, the tool will enable us to analyse new editor interaction from different perspectives (e.g, for different categories of contents) with less labor.

Topic

This sprint is rather about building a fundamental tool to be used to answer further research questions. One of such possible questions would be to find the most important features that contribute to deciding a message is a praise, criticism, educational comment etc. This sprint is largely related to RQ2.

Process

I first look at the content of the messages, and see what linguistic features can be extracted and can contribute the most to deciding the tone of the message. Different clues such as the sender's edit history might be included later.

I split the coded examples into the training set and the testing set for the classifier. After applying a supervised learning method with the training set, the performance of the classifier is evaluated on the testing set. The process of training and testing will be iterated for different feature sets and different hyperparameters of the classifier to see which is the most effective way of the classification.

Training procedure

For each coded message:

 * Extract raw features and put it to a MongoDB in the following structure ::
    
    {
      "entry" {
        "rev_id":   2894772,
        "title": "Yosri",
        "text": "Hi ....",
        "timestamp": "...",
        "sender": {},
        "receiver": {}
      },
      "labels": {
         "praise":  false,
         "criticism":   false,
         "warning": true,
         ...
      },
      "features": {
        "ngram":   {"type": "assoc", "values": {...}},
        "SentiWN": {"type": "assoc", "values": {...}},
        ...
      }
      "vector": {
        1: 1,
        2: 3.5,
        ...
      },
      ...
    }

Convert the raw features into vectors, and update all entries in the MongoDB. (Different selection of features and/or hash kernels may be used here.)
Different features include:
- SentiWordNet's sentiment polarity scores of the words used in the message
- N-grams of wikilinks found in the message.
Train a classifier with the feature vectors.
Output the resulting model.

Results and discussion

Label	Accuracy
Criticism	75.4717% (80/106)
Teaching	66.9811% (70/106)
Warning	76.4151% (81/106)
Praise_Thanks	67.9245% (72/106)

Classifiers are trained on 4/3 of the coded examples created in a previous sprint, and tested on the rest of them.

The accuracies above might not look too bad, but actually they are only slightly better than the 'baselines'. For example, 61% of the 'Teaching' examples are negative. This means that it is assured that the classifier can get 61% accuracy, if it always classifies anything as negative.

The features I currently give to the classifiers are:

Real-valued variables of sentiment scores of the words in the message, given by SentiWordNet
Binary-valued variables corresponding to regular expressions detecting some Wikipedia-specific notations (e.g., [[Wikipedia:Articles for deletion]], [[WP:...]])

Software

The tools developed for this sprint including a set of preprocessing modules, feature extraction templates, supervised classification modules (with the help of liblinear) and evaluation scripts are available at https://github.com/whym/wikisentiment.

Future work

It is necessary to improve the accuracy in order for this classifier to be used. I am aware of several things that will lead to better accuracies, and working on them:

Better treatment of Wikipedia diffs (currently I just pick up a relevant HTML element <td class="diff-addedline"> from the API output ^[1])
- The new engine or JWPL could be better choices.
Expanding Wikipedia specific patterns
Using word N-grams as features

References

↑ for example, http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvdiffto=prev&rvprop=ids%7Ctimestamp%7Cuser%7Cdiff&format=xml&revids=191732237

[1] r example, http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvdiffto=prev&rvprop=ids%7Ctimestamp%7Cuser%7Cdiff&format=xml&revids=191732237

[1]