Learning and Evaluation/Evaluation reports/2015/Data access

Wikimedia Programs Evaluation 2015

How was the data for this report collected?
Every year we host a round of data collection and encourage program leaders all over the world to submit their programs data.
Find out in this page how many programs were reported, how many mined, and what is the state of evaluation capacity in the movement.

Data collection

The first reports included findings from 119 implementations of 7 different types of programs, as reported by 23 program leaders from over 30 countries. To learn about these implementations, we relied extensively on data submitted to us by community members. Based on community requests received last year, we focused more on surfacing data that was already reported through grant programs event pages, and program registries on wikis.

For our second round of data collection, we did just that. We searched through grantee reporting high and low, and dug deeper through the reports linked blogs, events pages, and other supplemental reporting resources to identify and gather data from an increased number of program leaders and implementations worldwide.

In this second round, we reached farther and deeper to gather reports of programs and captured partial data on programs by: 98 different program leaders' (157% increase from baseline) 59 different countries (197% increase from baseline)

This capturing Includes reports about 733 different program implementations (617% increase from 2014). Of the 98 different leaders of the programs identified. Through email outreach we were provided additional data by 49 direct reporting program leaders (213% increase from 2014) lending primary data collection support for 222 of the identified implementations (30%).

Reporting data

Program Name	Number	% Reporting Directly
Editathons	141	48%
Editor Training Workshops	157	35%
On-wiki Writing Contests	57	16%
GLAM Content Release Partnerships	60	18%
Wiki Loves Monuments	72	39%
Photo Events	103	16%
Conferences	26	58%
Hackathons	10	40%
Wikimedia Education Programs	67	25%
Wikimedian in Residence	32	47%

Inputs and participation

Inputs

Only 24% of programs had report of specific inputs to their programs ^[1]

Regarding inputs, program leaders were asked to report:

Budget – how much it cost them to produce their program in US dollars
Staff and volunteer hours – How many actual or estimated hours staff and volunteers put into their program from beginning to end
Donated resources – Including equipment, prizes, give-aways, meeting space, and other similar things donated by organizations or individuals to support the program

For those program leaders who did report on specific program inputs:

24% included budget data
15% included staff hours
16% included volunteer hours

Input reporting was much higher for PEG grantees which are required to present project specific budgets. From PEG grantees, 17% of program implementations had reported budget^[2]. Relatedly, non-grantees and APG grantees were more likely than others to report hours input (13% and 11%, respectively).

Unfortunately, input data is some of the hardest data to obtain through secondary sources and it will be important for programs leaders to better track their inputs in order to understand the level of resources dedicated to their program activities.

Participation

The majority of program leaders had not reported how many people participated in their program, less than half were able to tell us how many new editors made accounts for their programs.

Regarding participation, program leaders were asked to report:

Total number of program participants
Number of participants that created new user accounts during the program

Program leaders were also asked to provide the dates, and times, if applicable, for their program.

Only 39% of program reports included the total number of participants and only 23% reported the number of new user accounts created during their program events. Through mining content and event pages, our team has worked to increase the number of known participants for tracking and reporting. We expect data access in this area to increase. In the mean time, watch each report for details on data access and to what extent we have been successful in filling the gaps.

Content production
and quality improvement

Content production

Most program leaders were able to tell us how much media was added during their program, but only a minority were able to report on how many characters, articles, or pages were affected directly by their program events.

Regarding content production, program leaders were asked to provide various types of data, about what happened during their program, depending on the level of data they were able to record and track. These data types were:

Total number of characters added (12% reported)
Number of photos/media added (88% media events reported)
Number of Wikimedia articles created (27% of text editing events reported)
Number of Wikimedia pages improved (17% reported)

Importantly, for any program events for which there is a known date, time and user list, or a specific category or set of work, these data can be retrieved relatively easily using data tools like Quarry and Wikimetrics. For this reason, the team has also worked to fill in some of these important gaps. We will share update in terms of each program’s data in the discussion about data access for each program report.

Quality improvement

Most program leaders were able to report the number of images uploaded during their program, and many reported how many were used in the projects after the program ended. However, most did not report about use or quality of articles or images in terms of ratings.

The survey also asked that program leaders to report on the quality of the content that was produced during the program. Those programs focused on text content demonstrated reporting of:

Number of articles created or improved (27% reported)
Total number of good articles (8% reported)
Total number featured articles (6%)

For media upload programs:

Number of media uploaded (88% reported)
Use count of media added that were being used in Wikimedia article pages (40% reported)
Number of Quality Images (69%)
Number of Featured Pictures (69%)
Number of Valued Images (67%)

Participant user status and content production data were extracted using Quarry and/or Wikimetrics based on usernames reported, or from activity measured on the content associated with the program event. This data collection round, through additional data extraction based on reported data, the reporting team was able to access additional quality and use measures for the majority of media events and many on-wiki writing contests where the content affected and participants are publicly available data.

Notes

↑ We cannot make direct comparison to reporting captured in the previous reports as there was previously so little data to mine, we did not collect it in the same fashion making comparison invalid ↑ Often, budgets are aggregated with other events, making it difficult to isolate budgets specific to one event

Participation, recruitment
and retention

Based on username data, we were able to assess how many participants were active, following their program participation for nearly all program events which documented usernames.

Tools like Wikimetrics can make this possible, which means tracking usernames is important to learning about retention. For editathons and workshops, the majority of those reported on did not retain new editors six months after the event ended. A retained "active" editor was one who had averaged five or more edits a month.^[1]

We looked at recruitment and retention by examining:

The number of newly registered users (those who made accounts at or for the event)
The number of existing users (those who did not newly register on a Wikimedia project within two weeks of an event start date),
The number of survived users , defined as making 1+ edits in a target follow-up window, and
The number of active editors, defined as making 5+ edits a month.

Participant usernames were split into two groups: new or existing users, in order to learn the retention details about each cohort. This is important, since Wikimedia programs often attract both new and experienced editors, this is especially true for editathons, editing workshops, and photo events, while less true for most on-wiki writing contests which generally target existing contributors, and the Wikipedia Education Program that generally targets new editors.

In terms of recruitment and retention of the data were extracted using Quarry and/or Wikimetrics based on usernames reported, listed on public event pages, or from activity measured of direct content editors associated with the content improved through the program event. Through this additional data extraction based on program leader reported data, the reporting team was able to access retention follow-up data for nearly all program implementations for which usernames were reported.^[2]

Notes

↑ This was measured at the furthest out follow-up window for each program cohort, at 30-days, three months, six months, and, in some cases, twelve months following the program start date
↑ Some exceptions have been made due to privacy considerations and lack of consent for user tracking and others for data access limitations, due to the amount of time it would take to extract usernames from globally distributed program outputs. Where that is the case it is noted in the Limitations section of the report.

Replication
and shared learning

We wanted to learn if program leaders believed their program(s) could be recreated (or replicated) by others. We also wanted to know if program leaders had developed resources such as booklets, handouts, blogs, press coverage, guides, or how-to's regarding their program. We asked if the program:

had been run by an experienced program leader who could help others do;
had brochures and printed materials developed to tell others about it;
had blogs or other online information written to tell others about it (published by yourself or others);
had a guide or instructions for how to implement a similar project.

[1] We cannot make direct comparison to reporting captured in the previous reports as there was previously so little data to mine, we did not collect it in the same fashion making comparison invalid

[2] Often, budgets are aggregated with other events, making it difficult to isolate budgets specific to one event

[3] This was measured at the furthest out follow-up window for each program cohort, at 30-days, three months, six months, and, in some cases, twelve months following the program start date

[4] Some exceptions have been made due to privacy considerations and lack of consent for user tracking and others for data access limitations, due to the amount of time it would take to extract usernames from globally distributed program outputs. Where that is the case it is noted in the Limitations section of the report.

[1]

[2]

[1]

[2]