In 2013, the Wikimedia Foundation began an initiative that would lead to a greater understanding of the incredible work that international Wikimedia organizations and individual volunteers are doing around the world, to increase content on Wikimedia projects, and its use for educational purposes. The evaluation initiative in the movement is unique in that, not only it gathers data for these reports, but also aims to develop capacity in different communities to build a shared understanding of what learning and evaluation means.
What will lead to a significant improvement in our overall community health?
How can we better learn from each other and adapt our programs?
To address these questions and help measure the impact of our programs, the Wikimedia Foundation’s Program Evaluation and Design initiative started in April 2013, with a small team and a community call to discuss program evaluation. In the first few months, the team worked to identify the most popular Wikimedia programs and collaborated with a first set of program leaders to map the program goals and potential metrics. Then, the team invited a broader community of program leaders to share feedback about their capacity for evaluation through an online survey. We wanted to explore what programs were out there, what was important for program leaders and what they were measuring.
Initial survey results indicated that many program leaders were tracking quite a bit of data about their programs. By August 2013, informed by these survey results, we launched the first Round of Data Collection in September 2013 and completed our first Evaluation Report (beta). This high-level analysis started to answer many of the questions raised by movement leaders about key programs and their impact. The report was well received by our communities and generated many discussions about the focus of these programs, their diversity and the data they collected. But it still left room for improvement.
Program evaluation workshops, blogs, and portal
Since releasing the initial beta reports, the program evaluation and design team members have conducted activities to grow evaluation capacity and awareness in the movement. The team has:
Led 11 in-person workshops in conjunction with community gatherings providing over 35 hours of workshop time including:
8 IRC office hour sessions for support on global metrics, SMART target development and reporting support.
Shared more than 15 learning and evaluation focused blog posts and worked to connect with over 100 program leaders to develop design resources and learning patterns for programs.
In July 2014, we collected a second annual survey of evaluation capacity via the 2014 Evaluation Pulse Survey. In most every case, a majority of program leaders reported using of each of these resources. Self-report of evaluation tracking, monitoring, and reporting had demonstrated a shift in community capacity and engagement in learning and evaluation.
Community norms and capacity are shifting toward impact and effectiveness in programs.
Of the 90 program leaders who responded to this year’s Evaluation Pulse survey, before the implementation of global metrics in late August 2014, most program leaders reported they were already tracking many key data points for understanding program inputs including: date/time of program, input hours, program budget/costs, and donated resources.
As well as tracking program outputs and outcomes: participant user names, number of new accounts created, gender of participants, number of media uploads, number of new articles, number of articles edited, content areas improved, and lessons learned.
Self-reports, for the most part, met or exceeded L&E targets for 125% increase from our 2013 baseline except in a few cases where we are reaching a ceiling effect as we near 100% self-report of program dates, username, and articles edited tracking (see graph below).
Comparison of program inputs being tracked by program leaders 2013 to 2014
Comparison of program being tracked by program leaders 2013 to 2014
In addition to tracking more inputs and output counts of their programs, most program leaders also reported tracking the content areas improved (87%) and their lessons learned (94%)
70% of survey respondents reported having accessed direct consultation with WMF team members. When asked what direct mentoring and consultation they had accessed:
35 leaders reported evaluation tools and resources consultation
When asked to share the ways they were monitoring participant outcomes, similar to the 2013 survey, program leaders were most often monitoring what participants contributed at the program events, followed by monitoring what participants contributed after the events (see graph below). Notably, program leaders were more likely to report they were using participant follow-up surveys in the 2014 Evaluation Pulse as compared to 2013. Consistent with the increasing number of requests for survey tool access by our team, self-reports also suggested that surveys were being used more in 2014 compared to baseline.
In addition to reports of increased programs and impact monitoring, nearly twice as many program leaders reported they identified measurable targets for their program goals in 2014 compared to only 39% who reported so in 2013 (see graph below).
Finally, the majority of program leaders were feeling prepared to move ahead with most aspects of evaluation:
67% reported being «mostly» or «completely» prepared to document their programming 
54% reported being «mostly» or «completely» prepared to articulate their program strategy 
52% reported being «mostly» or «completely» prepared to track and monitor their program accomplishments 
44% reported being «mostly» or «completely» prepared to measure their program impact 
↑Most often, Wiki Loves Monuments (31%), the overview (23%), Editathons (23%) and the Wikipedia Education Program (20%); 48% reported having read at least one report, while 18% had referred to one or more in planning their programs
↑Most often Wikimetrics (29%) followed by Program Evaluation Basics (23%)
↑All had read, while 15% had endorsed, 13% had written, 9% had edited, at least one pattern in the library.
↑7% reported other Wikilabs tools (4% GLAMorous, 2% CatScan)