# Research:Outreach evaluation

Created
2012/2
Contact
Collaborators
Nimish Gautam
Ayush Khanna
Duration:  2012-2 – ??
Contact: Mani Pande

Information may be incomplete and change before the project starts.

## Key Personnel

• Mani Pande
• Nimish Gautam
• Ayush Khanna

## Project Summary

This project will be to evaluate outcomes of various WMF outreach programs with regards to user contributions and participation in various projects

## Methods

We will be asking for self-reported information at outreach events, collecting this information and then comparing various user contribution activities from the accounts of users at outreach events to determine the potential effectiveness of the event.

### Survival Metric

Given that an article has revisions ${\displaystyle r_{1},r_{2},r_{3}...r_{c}}$ (where ${\displaystyle r_{c}}$ is the most current revision of the article available at time of performing the analysis) and the revision we're interested in is ${\displaystyle r_{t}}$ :

• A byte is considered significant if it is non-whitespace
• A byte is considered to have survived if it was put in by the user in revision ${\displaystyle r_{t}}$, and persisted to revision ${\displaystyle r_{c}}$
• The set of survived significant bytes for a revision ${\displaystyle r_{t}}$ is then ${\displaystyle s.b.=\{b|b\in r_{t}\land b\notin r_{t-1}\land b\notin r_{t}\cap r_{c}\}}$
Survival is calculated for this given revision as ${\displaystyle s.b.*[1+\log _{10}(c-t+1)]}$


Note on reordering text: There's a small, static "bonus" added to the number of significant bytes if any reordering of text was detected in that revision whatsoever (for instance, paragraphs being moved around). The reordered bytes aren't counted otherwise.

#### Rationale

We want to be able to figure out the number of bytes a user has added or changed in a given set of revision differences, and we want to see whether those changes persisted, as an approximation of the community's judgement of the information being added as being of high quality. Although persistence is not always an accurate measure of quality, the chances of a given edit being high quality is higher if it has survived 1000 revisions moreso than if it has only survived 1.

#### Interpretation

• The ratio of survived significant bytes to edit count can aid in identifying users whose editing patterns consist of high-content, highly survivable edits
• The ratio of Survival to edit count can aid in identifying users with high-content, highly survivable edits with consistency over time.
• The ratio of survived significant bytes to bytes added can aid in identifying users who produce highly survivable edits in general.

• Ranges : still TBD

#### Known shortcomings

• Edits that occur in sections of articles or articles that are subject to time, such as a sports score. If a user puts in a score of 40, and soon afterwards the team scores 15 more points and the article now says 55, it will be seen as those bytes entered by the user did not survive. This is not a good approximation of quality, as the edit was of high quality.
• Reversions of vandalism. The edits will count an unfairly large number of bytes as having survived.
• Note: there are numerous methods to detect vandalism reversion, and in the code implementation there is room for use of these heuristics if they are needed
• Collaborative editing sessions. This can be remedied by looking at a group of collaborative editors as one unit.

#### Code

Code that performs this analysis is available under the GPL on the wikimedia SVN repository

## Dissemination

All findings will be publicly available on a WMF wiki.

## Benefits for the Wikimedia community

Community and foundation will be able to better gauge and use effective outreach practices

(in-progress)