Research:Outreach evaluation

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Current event marker.svg This project page documents a research project currently planned.
Information may be incomplete and change before the project starts.
Research project
Outreach Evaluation
main contact
co-investigators
Nimish Gautam
Ayush Khanna
WMF contact Mani Pande
start 2012-2
status planned Icon 33 percent.png
fields computer science
statistics
information visualization
WMF support
Wikimedia research projects Wikimedia research projects

Contents

[edit] Key Personnel

  • Mani Pande
  • Nimish Gautam
  • Ayush Khanna

[edit] Project Summary

This project will be to evaluate outcomes of various WMF outreach programs with regards to user contributions and participation in various projects

[edit] Methods

We will be asking for self-reported information at outreach events, collecting this information and then comparing various user contribution activities from the accounts of users at outreach events to determine the potential effectiveness of the event.

[edit] Survival Metric

Given that an article has revisions r_1, r_2, r_3...r_c (where r_c is the most current revision of the article available at time of performing the analysis) and the revision we're interested in is r_t :

  • A byte is considered significant if it is non-whitespace
  • A byte is considered to have survived if it was put in by the user in revision r_t, and persisted to revision r_c
  • The set of survived significant bytes for a revision r_t is then s.b. = \{ b | b \in r_t \and b \notin r_{t-1} \and b \notin r_t \cap r_c \}
Survival is calculated for this given revision as s.b. * [1 + \log_{10}( c-t + 1)]

Note on reordering text: There's a small, static "bonus" added to the number of significant bytes if any reordering of text was detected in that revision whatsoever (for instance, paragraphs being moved around). The reordered bytes aren't counted otherwise.

[edit] Rationale

We want to be able to figure out the number of bytes a user has added or changed in a given set of revision differences, and we want to see whether those changes persisted, as an approximation of the community's judgement of the information being added as being of high quality. Although persistence is not always an accurate measure of quality, the chances of a given edit being high quality is higher if it has survived 1000 revisions moreso than if it has only survived 1.

[edit] Interpretation

  • The ratio of survived significant bytes to edit count can aid in identifying users whose editing patterns consist of high-content, highly survivable edits
  • The ratio of Survival to edit count can aid in identifying users with high-content, highly survivable edits with consistency over time.
  • The ratio of survived significant bytes to bytes added can aid in identifying users who produce highly survivable edits in general.


  • Ranges : still TBD


[edit] Known shortcomings

  • Edits that occur in sections of articles or articles that are subject to time, such as a sports score. If a user puts in a score of 40, and soon afterwards the team scores 15 more points and the article now says 55, it will be seen as those bytes entered by the user did not survive. This is not a good approximation of quality, as the edit was of high quality.
  • Reversions of vandalism. The edits will count an unfairly large number of bytes as having survived.
    • Note: there are numerous methods to detect vandalism reversion, and in the code implementation there is room for use of these heuristics if they are needed
  • Collaborative editing sessions. This can be remedied by looking at a group of collaborative editors as one unit.

[edit] Code

Code that performs this analysis is available under the GPL on the wikimedia SVN repository

[edit] Dissemination

All findings will be publicly available on a WMF wiki.

[edit] Wikimedia Policies, Ethics, and Human Subjects Protection

[edit] Benefits for the Wikimedia community

Community and foundation will be able to better gauge and use effective outreach practices

[edit] Time Line

(in-progress)

[edit] Funding

[edit] References

[edit] External links

[edit] Contacts

Personal tools
Namespaces

Variants
Actions
Navigation
Community
Beyond the Web
Toolbox