The Current Initiative
In November 2012, the Wikimedia Funds Dissemination Committee (FDC) proposed that we discuss «impact, outcomes and innovation, including how to measure, evaluate and learn» across the Wikimedia movement.
This request raised some important questions, such as:
- What will lead to a significant improvement in our overall community health?
- How can we better learn from each other and adapt our programs?
To address these questions and help measure the impact of our programs, the Wikimedia Foundation’s Program Evaluation and Design initiative started in April 2013, with a small team and a community call to discuss program evaluation. In the first few months, the team worked to identify the most popular Wikimedia programs and collaborated with a first set of program leaders to map the program goals and potential metrics. Then, the team invited a broader community of program leaders to share feedback about their capacity for evaluation through an online survey. We wanted to explore what programs were out there, what was important for program leaders and what they were measuring.
Initial survey results indicated that many program leaders were tracking quite a bit of data about their programs. By August 2013, informed by these survey results, we launched the first Round of Data Collection in September 2013 and completed our first Evaluation Report (beta). This high-level analysis started to answer many of the questions raised by movement leaders about key programs and their impact. The report was well received by our communities and generated many discussions about the focus of these programs, their diversity and the data they collected. But it still left room for improvement.
- Program evaluation workshops, blogs, and portal
Since releasing the initial beta reports, the program evaluation and design team members have conducted activities to grow evaluation capacity and awareness in the movement. The team has:
Led 11 in-person workshops in conjunction with community gatherings providing over 35 hours of workshop time including:
- 5 multi-hour workshop sessions
- 3 regular conference sessions
- 2 full-day pre-conference workshops
- 1 Learning & Evaluation (L&E) Poster Session
Organized 24 virtual meet-up opportunities offering more that 30 hours of direct training and support on learning and evaluation including:
- 16 recorded virtual meet-up sessions by L&E for Wikimedia movement leaders development and coordination (recorded via YouTube and available for viewing post event)
- 8 IRC office hour sessions for support on global metrics, SMART target development and reporting support.
Shared more than 15 learning and evaluation focused blog posts and worked to connect with over 100 program leaders to develop design resources and learning patterns for programs.
In July 2014, we collected a second annual survey of evaluation capacity via the 2014 Evaluation Pulse Survey. In most every case, a majority of program leaders reported using of each of these resources. Self-report of evaluation tracking, monitoring, and reporting had demonstrated a shift in community capacity and engagement in learning and evaluation.
- Community norms and capacity are shifting toward impact and effectiveness in programs.
Of the 90 program leaders who responded to this year’s Evaluation Pulse survey, before the implementation of global metrics in late August 2014, most program leaders reported they were already tracking many key data points for understanding program inputs including: date/time of program, input hours, program budget/costs, and donated resources.
As well as tracking program outputs and outcomes: participant user names, number of new accounts created, gender of participants, number of media uploads, number of new articles, number of articles edited, content areas improved, and lessons learned.
- Self-reports, for the most part, met or exceeded L&E targets for 125% increase from our 2013 baseline except in a few cases where we are reaching a ceiling effect as we near 100% self-report of program dates, username, and articles edited tracking (see graph below).
Comparison of program inputs being tracked by program leaders 2013 to 2014
Comparison of program being tracked by program leaders 2013 to 2014
In addition to tracking more inputs and output counts of their programs, most program leaders also reported tracking the content areas improved (87%) and their lessons learned (94%)
70% of survey respondents reported having accessed direct consultation with WMF team members. When asked what direct mentoring and consultation they had accessed:
- 35 leaders reported evaluation tools and resources consultation
- 29 leaders reported evaluation strategy consultation
- 20 leaders reported survey development consultation
- 16 leaders reported data analysis consultation
Many had also used a number of Portal Resources:
Reach of Virtual Meet-ups (Minutes viewed).
Virtual meet-up videos have been viewed by program leaders in 86 countries. Countries viewing the virtual meet-ups are shaded green based on minutes viewed, the darker the green the more time viewed.
When asked what tools they were using to understand programs and their impact:
- 65% reported Wiki pages
- 50% reported conversations
- 45% reported attendance logs
- 40% reported feedback forms
- 37% reported spreadsheets
- 32% reported Wikimetrics 
When asked to share the ways they were monitoring participant outcomes, similar to the 2013 survey, program leaders were most often monitoring what participants contributed at the program events, followed by monitoring what participants contributed after the events (see graph below). Notably, program leaders were more likely to report they were using participant follow-up surveys in the 2014 Evaluation Pulse as compared to 2013. Consistent with the increasing number of requests for survey tool access by our team, self-reports also suggested that surveys were being used more in 2014 compared to baseline.
Program leaders were more likely to report they were using participant follow-up surveys in the 2014 Evaluation Pulse survey than in the previous year.
In addition to reports of increased programs and impact monitoring, nearly twice as many program leaders reported they identified measurable targets for their program goals in 2014 compared to only 39% who reported so in 2013 (see graph below).
Finally, the majority of program leaders were feeling prepared to move ahead with most aspects of evaluation:
- 67% reported being «mostly» or «completely» prepared to document their programming 
- 54% reported being «mostly» or «completely» prepared to articulate their program strategy 
- 52% reported being «mostly» or «completely» prepared to track and monitor their program accomplishments 
- 44% reported being «mostly» or «completely» prepared to measure their program impact 
Program Leader self-ratings of evaluation capacity illustrated a shift toward higher perceptions of preparedness as can be seen taking a more close look at the response distributions
from year to year.
- ↑ 130% increase over 2013 Capacity survey
- ↑ 114% increase over baseline
- ↑ 325% increase in staff hours tracking, 168% increase in volunteer hours tracking over baseline
- ↑ 187% increase over baseline
- ↑ not asked at baseline
- ↑ 115% increase over baseline
- ↑ 185% increase over baseline
- ↑ 300% increase over baseline
- ↑ 119% increase over baseline
- ↑ 120% increase over baseline
- ↑ 153% increase over baseline
- ↑ 167% increase over baseline
- ↑ 140% increase over baseline
- ↑ 167% increase from 52% at baseline
- ↑ 140% increase from 67% at baseline
- ↑ Most often, Wiki Loves Monuments (31%), the overview (23%), Editathons (23%) and the Wikipedia Education Program (20%); 48% reported having read at least one report, while 18% had referred to one or more in planning their programs
- ↑ Most often Wikimetrics (29%) followed by Program Evaluation Basics (23%)
- ↑ All had read, while 15% had endorsed, 13% had written, 9% had edited, at least one pattern in the library.
- ↑ 7% reported other Wikilabs tools (4% GLAMorous, 2% CatScan)
- ↑ 134% increase over the 50% at baseline
- ↑ 126% increase over the 43% at baseline
- ↑ 121% increase over the 43% at baseline
- ↑ 152% increase over 29% at baseline