Talk:Learning and Evaluation/Evaluation reports/2013/Wikipedia Education Program

Library · Tools · Parlor · News · Leave feedback

Ask questions and discuss program evaluation

This page is kept for historical interest. Any policies mentioned may be obsolete. If you want to revive the topic, you can use the talk page or start a discussion on the community forum.

Hello everyone!

As many of you may remember, the Program Evaluation and Design team invited program leaders, such as yourselves, to voluntarily participate in our first Data Collection Survey. Thank you for your participation! As a pilot reporting round, it has provided us insight into what you are tracking regarding data in the programs you implement, and how we can better support you in order to evaluate the impact your programs are making.

We have been looking at that data for our current focus programs: edit-a-thons, Wikipedia editing workshops, GLAM content donations, on-wiki writing contests, Wiki Loves Monuments and other photography initiatives, and the Wikipedia Education Program When the survey closed, we had responses from 23 program leaders from around the world, who reported on 64 total programmatic activities. Our team also collected data from 54 additional programmatic activities, to fill out areas where we had gaps in reporting.

We're excited to announce that we have our last beta report ready for you to. It includes data reported and collected about:

https://meta.wikimedia.org/wiki/Programs:Evaluation_portal/Library/WEP

This report is not final goal for reporting and we would like to improve it over time. Since this is our first time doing a report like this, we want your feedback. Questions are welcome and encouraged here on the talk page.

After you take the time to read the report, we would like your feedback on the talk page about:

Whether the report is understandable and written in a way that makes it useful to you
What kind of information you would like to see more of / less of
What we could do to improve the collection of data
Any other feedback you have.

Anonymity of reporters[edit]

Latest comment: 10 years ago3 comments2 people in discussion

(Question from Wikimedia-l: "For the overall WEP report could you please spell out on the Wiki page exactly what programs you are talking about, and link each to their specific report? I'm having a hard time figuring out exactly what is being reported as part of the WEP, what projects are affected, and which programs have more participants.")

Program leaders who self-reported were assured their data would be reported without their program name identifiers. With this low a report count, even without program names listed, this is very difficult to do. The implementations reported here represent program activity in the Arab world program, Czech Republic, Mexico, Nepal, Quebec, and the US/Canada. The data reported at the bottom actually have unique "Report ID" numbers that can be matched across the last three tables so that you can actually regenerate the dataset missing only event names (See Appendix heading "More Data" for the complete input, output, and outcome data used in the report). Those data include the instructor classroom count, number of program weeks, and participant counts for each implementation reported. In the future we plan to ask program leaders what level of identifiability in this reporting they are comfortable with and will include identifiers in cases in which reporters volunteer to share it publicly.

However, as there is some expressed interest in possibly comparing programs, I must restate the need for caution, at this early stage in the reporting, with such small numbers of implementers reporting (less than 10% potentially), we are aware that the data do not represent all programming, and that the data are too variable to draw comparisons between programs statistically. Further, in the case where the count of classroom varies highly across implementations, aggregate reporting of more than one-hundred classrooms is not directly comparable to the reporting of a single classroom since summative statistics from an increased number of observations generates a regression to the mean and do not make for a one-to-one comparison.

These issues as well as any other comments and/or suggestions are welcome on this talk page. JAnstee (WMF) (talk) 15:19, 7 April 2014 (UTC)Reply

I do not understand why such an assurance was granted to the programs; the majority of this information is strictly numeric data that should be freely available. In fact, the cost data (both monetary and temporal) is extremely important, and use of limited movement resources should be well-detailed. It's unhelpful for other programs that wish to learn from those who are successful, or want to avoid the pitfalls that others experienced. Hundreds of volunteers participated in these programs, and thus the volunteer editor community should be able to assess independently whether or not each of these programs has received an appropriate level of volunteer support. You've still not made clear what *type* of education programs you're talking about, or which projects were the beneficiaries of these programs; the WEP page indicates that the scope of the program includes a wide range of activities, from edit-a-thons to longterm university-based programs. I'm very disappointed in the lack of transparency in the data (yes, it makes a difference - a hundred new pages on a project with fewer than 100,000 articles is a much bigger deal than a hundred new pages on a project with over a million articles), and even more concerned that there seems to be very little analysis on whether or not the individual projects are deemed a success, what worked and did not work, what learning has been derived that will be applied to future projects, and so on. I have no sense at all from this report whether or not any of these projects was considered by the WEP to have been a success or a failure. One of your colleagues speaks quite eloquently of the benefits of trying something even if it fails, and it's a cultural element that needs to be encouraged; however, if nobody's willing to admit something's failed, then we lose all the learning benefits from those failures. The same is true in obfuscating whether or not something has succeeded. Risker (talk) 16:06, 7 April 2014 (UTC)Reply

Thank you for your feedback, I understand your concerns and share your interest in developing our learning capacity. Please remember that, at this time, this reporting is both voluntary and a pilot for the high-level programs reporting. To clarify, here we have included programming specific to the Wikipedia Education Program involving classroom editing activities, not stand alone edit-a-thons or other programs as identified in this series of beta reports. The program implementations reported here are, to the best of my knowledge, all classroom-based student programs. However, there are many education programs that are operating with different delivery models. In addition, the Global Education team is currently examining the variety of program models that are being implemented world-wide to determine their consistency and/or creative divergence from the large-scale US/Canada model. I am sorry for the confusion there. If you have a suggestion for how I can make that more clear, please share it.

As far as anonymizing reported data, it is very much our intention to continue to allow for the choice of anonymity in reporting to access as much reporting data as is out there. Many of these programs are volunteer-run with no staff or monetary resources and no required accountability. At the current time, capacity for such reporting is very limited, but we have begun approaching systematic reporting through these initial beta reports. If you read the overview to these reports you will read that we are unable to draw many conclusions at this time because we are only at the beginning of data tracking and collection and thus our our ability to analyze Return-on-Investment. Most program implementers have not systematically tracked, and are therefore unable to report, the very things we need to base overall assessment on (i.e., budget, hours, content production and other outcomes in the way we are modeling here).

One of our tasks, as the program evaluation and design team, as we increase reporting and are able to make valid comparisons across settings and implementations, is to identify those points of success and failure and examine them more closely to share those learnings more broadly. We will work to identify models of promising practice as well as learning patterns to to avoid common pitfalls, as we build the knowledge base. Currently, we are for the first time presenting a set of systematic lenses for evaluating and producing a first baseline of the range of inputs, outputs, and outcomes associated with some of the key metrics that are accessible. In addition, we are working out the logistics of multi-level collaborative reporting and ways to ease the burden and increase the capacity for reporting, so that together we can answer those questions and make determinations of what is success vs. failure (i.e., What is a reasonable or "appropriate level of volunteer support"? What is the value of the content and participation garnered by these programs in terms of overall impact toward the mission of the Wikimedia projects?). At this time, there is no gold standard or threshold of "success;" that is the very thing we are working with program leaders to define, identify, and investigate further. We are certainly not trying to obfuscate failure, but pave the path for systematic and honest reporting of Wikimedia programs. As I said, here and with reference to all the current series of beta reports, in future rounds of reporting, we will ask program leaders to indicate the level of identifiability they are comfortable with in our reporting and include identifiers in cases in which reporters volunteer to share their reporting publicly, and only provide anonymous report lines in cases where they do not. JAnstee (WMF) (talk) 18:46, 7 April 2014 (UTC)Reply

The 15 % retention case[edit]

Latest comment: 9 years ago1 comment1 person in discussion

Note 29 is interesting, but half of the text is composed by repetitions. --Nemo 11:14, 26 May 2014 (UTC)Reply