User:EpochFail/Journal/2011-11-13

From Meta, a Wikimedia project coordination wiki

Sunday, Nov. 13th[edit]

I sat down today to wrap up my huggle coding work, but it looks like qbox is down, so I'm switching back to thinking about AFT analysis.

I'd like to map out my thoughts on performing hand coding of AFT and gather some ideas about tracking users to see what effect AFT is having on their likelihood of editing.

Feedback quality[edit]

The primary question that I want to answer here is whether feedback can be useful or not.

Codebook[edit]

It will be important that I look over a wide range of AFT feedback to get a sense of what types of feedback people will give. Since the current interaction of AFT doesn't support written feedback, I'll have to wait until we release the next iteration.

  • Before hand coding: discover categories
    • Examples:
      1. obscenities
      2. bad-faith nonsense
      3. good-faith nonsense
      4. complaint
      5. suggestion
  • Hand code: Usefulness?
    • Answers: Do readers have something useful to editors to contribute?
    • Relevancy
      1. Does not address the subject at all
      2. Addresses the subject, but not with encyclopedic interest
      3. Addresses the subject with encyclopedic interest
    • Operationalization
      1. Not useful
      2. Possibly useful for some editors
      3. Immediately useful to me
  • Hand coded: Good faith?
    • Answers: What kind of audience are we tapping?
    • Answers: Will readers take AFT seriously?
    • Feedbacker intentions
      1. offend
      2. be funny
      3. pure nonsense
      4. state opinion
      5. complain about problems
      6. suggest changes

Hand coding[edit]

  • Code it ourselves
    • Expect a rate of about 100-200 codings per hour
    • High quality/consistency
    • High investment
    • Agility
    • May not represent Wikipedian's perspective
  • Mechanical turk
    • Essentially free time-wise
    • Medium quality. Substantial time & energy spent verifying quality.
    • Low investment
    • Not Wikipedians. May have no idea of what "useful" looks like.
  • Wikipedians
    • Highest validity
    • Coding system not ready or adhoc
      • Wikipedian coding system not yet ready
        • Possible line of development for me (Aaron)?
      • Qbox is high overhead
      • Lower investment than doing it ourselves, but higher than mechanical turk.
      • High management overhead.
        • Wikipedian's dill be distributed as a powerlaw

I'd really like to have Wikipedians code the feedback, since they will know best about what is useful or not useful, but making the work easy enough for them will require some development.

Hand coding interface[edit]

I figured a little brainstorming on the interface to code up the feedback would be useful. See gallery below.

It's time to go and I didn't actually get to working out the necessary information for tracking AFT users to see if the tool is helpful for them. I plan to think about that in the meantime and get down to business with that the next time I clock in.

20:07, 13 November 2011 (UTC)


Monday, Nov. 14th[edit]

I met with Dario tonight on Skype to talk about progress on teh AFT data model. I'm still working to take part of the project and make it my own. (gotta earn my keep!) It looks like working with Oliver on the community feedback feedback work is my ticket. We're meeting tomorrow and qbox is still down so I hope to use that time to brainstorm and get his take on what approach will be effective.

Tonight I want to map out some of my thoughts on tagging activities in Wikipedia to support our analyses. Forgive the following brain dump.

Privacy privacy privacy[edit]

First, it is essential that these activity tracking systems do not violate the privacy of editors or readers. Wikipedia is like a library and no one has any business knowing what anyone else is reading. Any tracking of activities should only support and reflect what editors and readers already expect is made public and is consistent with the privacy policy.

Data model[edit]

Now, systems like MongoDB offer a really cool datatype called a Database Reference. These objects allow for straightforward linking between *any* document in any collection (read: any row in any table). If such a facility existed in MySQL, then any action which saved a row in the database could have a corresponding annotation associated with it. The current schema would not have to change save for the addition of a new collection (read table) containing the references. Such a system could be used to track and maintain annotations of the actions of users after they:

  • Encounter an experimental interface
  • Take part in a survey
  • Switch between accounts

For Wikipedians, this could mean extended checkuser functionality that would improve the accuracy of such a system, but along with such accuracy comes strong privacy concerns.

Extension of WP facilities[edit]

The tagging system in Wikipedia, designed to tag edits with meta information, has been entirely used for abuse tracking up to this point. Due to this fact, it looks like there is resistance in the community for using it to track anything productive. On the other hand, it would be most excellent if tools could be tracked in Wikipedia using such a tagging system. Personally, I'd like to track edits that are performed using my own NICE, HAPPI and Wikignome. Right now, the best way to track the use of Huggle or Twinkle is by pattern matching a set of characters that come at the end of the edit summary (e.g. "(HG)"). This is inefficient to say the least and troublesome in many circumstances as it overloads the usage of the summary field and clutters the interface wherever summaries are displayed.

I'll have to go over this general idea with Oliver tomorrow too. Tracking more information is a touchy subject, but if done right, we can get a richer understanding of what is happening while preserving the privacy of readers and editors.