Research:Classifying wikilove messages

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Created
2011/08
Collaborators
Diederik van Liere
Maryana Pinchuk
Steven Walling
Duration:  2011-08 — 2011-09
Open access project  Open access
no url provided
Open data project  Open data
no url provided
GearRotate.svg

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.


Overview[edit]

This project involves categorizing a large set of Wikilove messages in English in order to get a better idea of how the community is using this new tool, and using that dataset in order to train an active learning classifier to automatically detect the sentiment of Wikilove messages on an ongoing basis.

Motivation[edit]

WikiLove represents the first substantial update to the native messaging capabilities of Wikipedia in years. As such, there are several things that the WikiLove dev team and WSOR researchers are interested in:

  1. Impact on civility: whether providing Wikipedians with a tool that makes it easy to praise and thank people will improve the perceived "civility" problem that is especially prevalent on user talk pages, and
  2. Impact on usability: whether providing Wikipedians with a messaging tool that makes it easier to post to talk pages (i.e. doesn't require the user to deal with WikiMarkup) will lead to interesting "off label" uses of Wikilove, such as use as a general messaging tool, or some other, unanticipated usage.
  3. Use by different types of editors: whether some kinds of editors (for instance, newbies vs. Wikipedians) are more likely to be givers or recipients of Wikilove than others
  4. Change in usage patterns over time: whether the use of WikiLove will change over time, as it is tested, accepted, appropriated or rejected by a growing percentage of the community

In order to answer question 4, this study will generate a set of keywords associated with different uses of WikiLove, to provide training data (word lists) for an automatic classifier of Wikilove messages that can be run at any time in the future without necessitating additional manual message coding.


Methods[edit]

This study will use a sample of 1775 Wikilove messages where the giver_id != receiver_id (to exclude self-love) and where the message body was not blank, collected on July 28 2011. Mechanical Turkers will be coding only the Subject line (e.g. "A kitten for you!") and Message ("Because kittens are cute and so are you."), not the image. Each message will be coded by two expert Turkers. Expert Turkers, in this case, means those who performed well in previous WMF-sponsored studies, like this one.

The messages were made available to the Turkers on Tuesday, August 2nd. As of 8/30, 97% of the batch had been completed, but work by the Turkers had effectively ceased, so the study was ended.

Coding Categories[edit]

Message contains the word "test"? (y/n)*

Message Content (Select one):

  • Praise/thanks (e.g. "That article you wrote was great!")
  • Criticism/insult (e.g. Foul language or other insults.)
  • Conversation type other than above (please describe in adjacent text box)

Add at least three keywords or key phrases (separated by commas) from this message that reflect the type of message it is.**

*This question was included for two reasons: 1) to get a sense of how many Wikilove messages (even ones sent to others, which included message body text) were tests, and 2) as a quality check, to make sure that the Turkers were reading every word of the Wikilove messages.
**These keywords were used to create word lists for training the autmatic Wikilove classifier. We also believe that this requirement helped the Turkers stay focused and attentive.

Results[edit]

Sampling[edit]

We gathered all Wikilove messages from the enwiki.wikilove_log table sent between June 30th and July 28th 2011. As you can see from the tables below, a large proportion of these messages were "self-love" (i.e. the sender posted the message on their own talk page), and a smaller percentage (16%) were blank messages with no text in the message body. Since we are most interested in the use of Wikilove as an interpersonal messaging tool, we excluded any messages that fit either of these criteria from the sample we gave to the Turkers to code.


Wikilove Message Population stats count percentage
total WikiLove messages gathered 3115
have blank message body (excluded from coding) 498 16%
self-love messages (excluded from coding) 1209 39%
non-blank, non-selflove messages (dataset to code) 1775 57%
Wikilove messages broken down by givers and recipients, June 29th through July 31st, 2011.
Wikilove givers and recipients percentage
newbie to newbie 5%
newbie to Wikipedian 15%
Wikipedian to newbie 6%
Wikipedian to Wikipedian 74%


Coded Sample - Quality and Scope

In general, the quality of the results was high. In 95% of cases, the two Turkers coding each message agreed on what category that message fell into. Spot-checks by the study facilitator led to a small number of HITs (approximately 1.5%) being rejected, meaning that the facilitator judged that the Turker's coding of the message was incorrect, or the Turker had failed to follow the instructions when coding the message. In these cases, the Turker received a message explaining why the HIT was rejected and the HIT was returned to the 'pool' to be coded by a different Turker.


Mechanical Turk Sample stats counts percentage
all coded HITs 3503
rejected HITs 56
accepted HITs 3447
all double coded messages 1674 94% of total dataset (as of 8/29)
double coded - Turkers agreed on coding 1556 93%
double coded - Turkers disagreed on coding 118 7%
all single coded messages 99

Message Types[edit]

The types of messages that Wikilove users sent from June 29-August 1, 2011.

By far the most common type of Wikilove message was "Praise/Thanks", at 77% of total messages where both Turkers agreed on the category. 22% of the messages were "Other", a catch-all category that is meant to include a variety of 'off-label' uses of Wikilove (see table below for examples).

"Criticism/insult" messages were a very small (<1%) percentage of the total--it would seem that WikiLove is generally being used in a loving spirit. :)


Message Category stats counts percentage
praise/thanks 1198 77%
other 347 22%
criticism/insult 11 <1%
contains word "test"?* 20 <2%
total coded messages 1556

* messages containing word 'test' were also classified as praise/criticism/other. Remember that our coded sample does not include self-love messages, or messages with no text in the message body, many more of which, presumably, were true 'test' messages.


Sub-types of "Other" Message

When labeling a message as "Other", Turkers were prompted to include an informal categorization of that message in a free-text field. These ranged from "Welcome" to "Apology" to "General Chat". Based on these informal categories, as well as our own observations of the "Other" messages, we intend to perform a second round of coding on this category to get a clearer picture of the kinds of off-label uses Wikipedians are using WikiLove for. So far, these are the general categories of "Other" messages that we believe are most common (note that the quotes below are examples, and the number of quotes appearing in each category does not necessarily reflect the proportions of "Other" messages that are of that sub-type).


Welcome - welcoming new or returning users wikilove id
"Hey, Haidill - glad to see you on board! Thought some falafel might be a fitting welcome meal... when are you back on this side of the world?" 1079
"A hearty welcome to Wikipedia. Your start is great. Please partake this cup of tea as a token of friendship. Note it is black tea as I am a vegan." 1954
"Welcome to Wikipedia, I do hope you are a real Zoo Keeper and you are from Tasmania because we need a few more of us who are in the industry on here. If you need assistance please dont hesitate to ask . Regards" 2526
Help Request - requests for help wikilove id
"I need help my cusins we on my profile and messed it up! Now what do I do?" 2984
"hi could u put semi protected on gervinho page plz" 1114
"Hello My name is Chung Min Oh, and i am a volunteer at the Hawaii State Art Museum and currently helping with a copyright situation that was found on your Isami Doi Wikipedia site. There seems to be a problem with the picture you uploaded. You will need t..." 849
"Can u please explain to me why u said u would delete my first articleen.wikipedia.org—Georgia_M._Dunston because i copied it??" 548
"Why do you keep changing my changes in the Attractions' Section of Phoenixville? I am simply adding attractions. Everything is properly cited i believe, so why change it? " 2283
Help Response/Advice wikilove id
"Hi again Greta! I see you've made it most of the way through the course page wizard. Now just create the tabs, beginning with the discussion tab, and you'll be ready to go!" 26
"Wikipedia can be trying at times, so I'm sending you a kitten with some advice. He says, 'I recommend you use the 'Article creation wizard at WP:ACW to create new articles. An even better idea would be to try and expand some old ones first, or add som..." 2457
Support/Condolences wikilove id
"No worries. Keep at it and you'll get the hang of it :-)" 176
"I just wanted to say that I fully support you in the 'hounding' thing against Astronaut. So go out. Go to a bar. Sit down, and have a beer. Cheers!" 474
"Here's a glass to relax after a day of crapy debates :p" 2742
Spontaneous WikiLove - random acts of WikiLove that are not specifically Praise/Thanks wikilove id
"Wiki-kitty says don't be a grumpy grump. Feel the love..." 326
"Enjoy your small chewy balls." 1595
"Like Wikipedia: crusty on the outside, soft and flexible on the inside!" 1639
Apology/Reconciliation wikilove id
"Just for cautious reasons and respect. I hope you don't treat my different opinions the wrong way. I know you (and I) are user's that try to resolve differences." 487
"Hi there Keith D. I hope I didin't offend you by constantly removing the header banner on the wikipedia page of Simon Baldry. I keep removing it because I think the page looks shabby with it. I have been working on the page for a while and I have not fi..." 1046
"Sorry for touching USS Chesapeake (1799)!" 1574
Non-Help Question - asking a non-help-related question, also request for comment and request for review wikilove id
"HI! I am working with 2 other Wikipedians to improve the article 'Wolfgang Amadeus Phoenix'. I wanted to get your input since you have already been somewhat involved in the article and seem to have a well-rounded insight into this realm. The other two ha..." 1150
"I am sending you this TeamWork Barnster as an invitation to work together here in Wikipedia. Please accept and Review B'Robby for qualification" 1780
"Why do you keep changing my changes in the Attractions' Section of Phoenixville? I am simply adding attractions. Everything is properly cited i believe, so why change it?" 2283
Test Message - test messages of various kinds wikilove id
"I just wanted to test the new <3 button at the top of the page :) I considered you a perfect candidate :)" 142
"Man, you don't come by the encyclopedia often, and they institute an automatic kitten-delivery system. I just discovered this... and in case you stop by anytime soon, I can't think of anyone else better to test it out on. I hope that you are doing well," 1158


Spam/Advertisment/Commercial Solicitation wikilove id
"Dear Clementi, I work as an assistant producer for a television production company called Blink Films based in the UK. We are producing a documentary for the Smithsonian Channel exploring the real life events that inspired the movie Close Encounters of t..." 2049
Warning/Notification wikilove id
"A tag has been placed on Christopher Columbus's Birthday that it can be speedily deleted from wikipedia. A page you created is nominated for deletion. Wow! A Redirect was Replaced by a tag for deletion. Sometimes, You redirected it to Columbus Day..." 3083
Conversational - general chit-chat (equivalent to a conversational talk page post that does not fall into any of the above categories OR into praise/thanks, criticism/insult, or spam) wikilove id
"Since we weren't able to bring you to any pubs when you were here, I guess this will have to do. It was nice seeing you last weekend and I hope to see you again next year!" 1758
"Wazzup Bryce? It's me Ice Season from wikia, but under the name Sonicfan0329. We can possibly talk alot here. Bye!" 2404
"We buy a house today in Riverview, Thomas. Isn't that great!" 2414


WikiLove Classifier[edit]

Subjectivity lexicon features
We use SentiWordNet's sentiment polarity scores as a feature given to the supervised classifier. We look up each word in the message, get the scores of the SentiWordNet entries that exactly matches after lemmatization, and average them to get a single value for a message. We also generate features dependent to each word.
Curated key phrases
We assume key phrases found by Turkers are strong clues to the decision, and can be generalized to unseen messages.
N-gram bag-of-words features
All occurrence of N-grams are used as binary features, without filtering. If there are certain N-grams that frequently appear in the message with in a class but not so frequently appear in others, those N-grams are supposed to work as a good trigger of deciding the class label.

I used liblinear to run the supervised learning of binary classification.

More details on the implementation will be described in Research:Sentiment analysis tool of new editor interaction.

Evaluation[edit]

We trained and evaluated the classifier with a five-fold cross validation. Out of 2974 coded messages (double coded messages are counted twice), 2380 messages were used to train the classifier, and 594 were used to evaluate it.

The table below shows performances in f-measure for deciding true/false of each label.

Praise/thanks Criticism/insult Other
F-measure 94.90 27.59 82.72
#errors 232/2974 21/2974 236/2974
#positive examples 2289/2974 24/2974 663/2974

Discussion[edit]

For the classification of 'praise/thanks' and 'others', we believe that the current classifier can produce meaningfully correct predictions. Based on this, we are preparing to build a live system that classifies and shows the breakdown of up-to-date WikiLove messages.

Severely low score in f-measure for 'criticism/insult' was due to the lack of training examples that are enough to train. Only 24 were marked as 'criticism/insult' in the dataset.

There was a slight possibility of overfitting caused by the key-phrase features. It was because we gathered key phrases from the whole data regardless of the distinction of training/evaluation set in the cross validation.

However, the comparison below between adding the key-phrase features and removing them suggests that the overfitting is ignorable for "Praise/thanks" and "Other".

Praise/thanks Criticism/insult Other
F-measure (w/o key-phrase) 94.81 8.00 82.02
F-measure (w/ key-phrase) 94.90 27.59 82.72

Conclusion[edit]

These findings demonstrate that even during the first month of after the Wikilove rollout, the tool is already being used for a variety of purposes OTHER than sending people kittens, beer and barnstars in recognition of a job well done. Members of the WikiLove dev team and WSOR believe that WikiLove will be used more and more frequently as a general messaging tool, because it provides a substantially more user-friendly way to post a message to someone's talk page--and is probably more fun, to boot.


Additional analysis based on these data and additional data gathered by the Wikilove Classifier can be used to validate these suspicions, or to demonstrate heretofore unanticipated consequences of unleashing WikiLove upon the world!

Resources and Previous work[edit]