Jump to content

Learning and Evaluation/Archive/Program Evaluation and Design/Resources/Pilot Reporting Items

From Meta, a Wikimedia project coordination wiki
Icebreaker session about the positive outcomes of program evaluation in the Wikimedia movement at the first Program Evaluation workshop

Survey participant feedback[edit]

Please share your feedback with us after taking the survey!

Some feedback on spreadsheet "data-gathering prep sheets by program"[edit]

tl;dr: We do need a helpful template to support planning and data-gathering but a spreadsheet such as this 2013 PE&D one is not it. It needs to be redesigned to meet the needs of Wikimedians.

Improvement needed[edit]

The context of such a data-gathering spreadsheet is process quality improvement (as opposed to improving say, the quality of articles or images). That is, data-gathering is about improving the quality of the processes that support the main business. The first questions then, are:
a) what is the process supposed to do”? and
b) who is the process (and associated documents) supposed to serve?

My own answers to these questions are that data-gathering processes and documents should:
a) help make planning, gathering data and reporting easy for Wikimedians; and
b) help WMF receive useful data to aggregate and determine trends.

Whether or not they actually help is judged by the user/customer.

Customer service[edit]

So – this spreadsheet has two sets of customers with similar requirements for data gathering and one of those sets of customers (that is, Wikimedians) needs it to help with planning as well. Above all, criteria for success are that it ought to save us time and effort. However, as it stands, this spreadsheet fails to meet those criteria and fails to support Wikimedians’ needs. Instead it is focused almost entirely on the receiver of the data and does not help with planning.

We really need a good spreadsheet/template to help us both plan and gather data easily but this is not it. This spreadsheet is daunting; it is not helpful to a volunteer Wikimedian and it would not be easy to complete.

It will take a bit of trial and error and feedback to make the spreadsheet useful and used. It needs redesigning. I offer some suggestions below. There are more that can be contributed.

Suggestions for improvement[edit]


Design it so that it can be used while planning or preparing for a program. Evaluation needs to be built into the initiation and planning of a program/event, so thinking about data-gathering at the end of an activity is too late.


Clarify its purpose and make it serve its customers. That is, work on making it serve the planning needs of Wikimedians as well as the data gathering and reporting needs of both Wikimedians and WMF (rather than only the latter group).


Some questions are about preparation, some are about outputs, some are about outcomes. These all happen at different times in a program. This item sequence does not help the person on the ground – the Wikimedian. For example, impact measures, such as "number of editors still active three months after" is not part of a "prep sheet" as "prep" would be understood by a program planner. Such figures obviously could not be collated before, or even on, the day of a program, which means that as it is, the whole form would be put aside (and probably forgotten) until everything was well and truly over.

Planning support

It would be good to recognise that the outcomes were likely to have been goals. So putting them at the beginning as well might help people to think through goals, likely outcomes and the associated likely data. For example, if the template was organised by timeframe it might be easier to complete and achieve both planning and data gathering goals. That is, there could be headings for activities/data that are relevant to before or after the event. (Numbers attending, etc) Or it could be organised by resource requirements (printed materials, labour). Inputs such as volunteer time (and other resources) could be presented earlier in the document and short-term outputs later (bytes created etc.)

Short-term versus long-term data

Perhaps longer-term outcome data could be separated to make it easier to return to them later.

Date formatting

I suggest any dates be formatted as they are in Wikipedia: YY/MM/DD. Then all the dates line up automatically in any list or spreadsheet when sorting. Also, where I live, a more straightforward DD/MM/YY format is used in day-to-day work, so using a mixed-up MM/DD/YY format is sure to create errors.

Currency conversions

No one is going to make conversions to US dollars as they go. It has to be recognised that this is not a need of the Wikimedia program planner. Perhaps any currency conversions could be built into a WMF data analysis program that is activated on receipt of the data.

Whiteghost.ink (talk) 04:12, 9 October 2013 (UTC)[reply]

Pre-Survey Feedback[edit]

Hi everyone!

The Program Evaluation & Design Team has been busy - thank you for your help so far in helping us better understand evaluation in the Wikimedia movement!;

We have been gathering information and feedback from program leaders through surveys and reaching out personally to some of you about Program Evaluation. Thank you for your participation and feedback thus far! It has been great for us to learn what is going on with these programs, their desired impact(s), and the types of data you’re gathering and/or looking to gather regarding evaluation of your programs.

All of this information gathering has allowed the Program Evaluation & Design team to see what types of measures and data gathering goals program leaders, may be most capable of reporting and have in common across the programs we’ve been looking at since our team came together (Edit-a-thons, Editing workshops, GLAM content donations, Wiki Loves Monuments, Wiki Takes, WikiExpeditions, Wikipedia Education Program, on-wiki contests).

What’s happening next

In the upcoming week, we will invite program leaders to participate in reporting about their programs so that we may both test out this potential set of reporting items as well as begin systematic program evaluation efforts for this initial set of programs. Reporting this data will be voluntary, of course.

These efforts will help a wide range of stakeholders (program leaders, grantmakers, and the movement at large), to better understand these programs and their impacts. Through this exploration we will begin to answer some of the key questions that we all have about these programs, their impact(s), and how they might choose programs that are effective toward these common programming goals (Participation, Content Production, Increasing Quality, Recruitment and Retention).

Your feedback is critical

We’d like community feedback on this set of questions and have created a document which details a question matrix. The question matrix features all of the potential reporting items that program leaders may be asked by program type. Specifically, each potential instruction and question on a separate row (or line) and a column for each program type which indicates whether the item will be included in that program’s reporting. For example, if you want to see what data will be requested about edit-a-thons, you’ll look at the column of questions and be able to know if it applies to edit-a-thon’s based on whether an “x” appears in the box for that question beneath the column heading “Edit-a-thons.”

You can review it here:


Our apologies for the need to use a google spreadsheet, but as the matrix is rather complex, it is the best way we could share the full data and ensure its integrity.

Please share your feedback on the bottom of this wiki page and make sure to reference any specific line number if you are providing feedback on a particular item.

Think about things like language (if English is not your second language, are you comfortable with the wording?), question relevance, and specificity. Please share your feedback with us by the end of day Thursday, September 26th as we hope to send survey invitations out by the end of the week.

Thanks everyone!

Sarah, Jaime and Frank, Program Evaluation and Design team


Please share your feedback, with your signature, here!

Gender and ethnicity?[edit]

Interesting. WE will think on this and send considered responses.Nothing about ethnicity or gender? Jon Davies WMUK (talk) 08:01, 25 September 2013 (UTC)[reply]

Thanks, Jon, we will look forward to your further feedback. We have left an open essay reporting box for each of the four reporting areas (participation, content production, quality, and recruitment/retention) contained within the survey for people to report any additional data of interest that they happen to have (e.g. gender and diversity could be an observation included in that response category for participation and/or retention depending on what type of data one has collected). However, according to the evaluation capability status survey we collected last month, very few program leaders have been tracking that sort of demographic and diversity data up to this point. Similar to targets for increased skill or interest which are also common, but currently absent, some target objectives will require survey inquiry or other further tool development in order for systematic collection and reporting. Those potential measures still need additional evaluation capacity developed for the majority of program leaders and are not included for this initial reporting round. Still, we will encourage program leaders to share data on those additional goal areas that they have targeted and tracked data for their specific programming impacts beyond this basic set of counts. Hope that makes sense. JAnstee (WMF) (talk) 15:51, 25 September 2013 (UTC)[reply]
JAnstee, thanks for this reply - I'm having trouble parsing it though. You say "Similar to, also absent but common, targets for increased skill or interest, some objectives will require survey inquiry, and those potential measures need additional evaluation capacity developed in order for systematic collection and reporting" - could you rephrase that sentence please? Richard Symonds (WMUK) (talk) 11:43, 26 September 2013 (UTC)[reply]
Revised, hope that helps JAnstee (WMF) (talk) 17:25, 26 September 2013 (UTC)[reply]

More suggestions[edit]

Perhaps some of these random items might be useful additions...

  • Was/Where is the Wikipedia page about the event/project (useful for lurkers and gawkers)
  • Was any outside media coverage generated - how can that be measured?
  • What was spent on pr
  • Where there in kind donation by GLAM
  • Did GLAM-host do anything to cross promote
  • Was there swag or reward offered to participants

In the US, institutions understand/LOVE reporting, so if we as volunteers are armed with reporting tools we can showcase to our host/partners, the likelihood of Wikipedia events being taken more seriously, seriously increases ... (*_*) Bdcousineau (talk) 15:44, 25 September 2013 (UTC)[reply]

Hi Bettina. These are good suggestions, however, we didn't see them as being common reporting items across the movement in the last survey we did (that you took). We'll be reporting our findings about that this week in a blog. I track a lot of these things, well, in my head, when I've worked with GLAMs, so I do think there are some good suggestions, however, like I said, we noticed they weren't that common. To remedy that, we're putting blank comment boxes for each type of program we're gathering data about (i.e. GLAM content donation, edit-a-thon, etc) so you can share data like this "Amount of in kind donation by GLAM - $500 worth of catering for edit-a-thon") by filling it in yourself. And, if we do see many people reporting your suggestions, then we will know we need to have them in the next data collection survey. There is one question about press: "This [Program Name] project/event has blogs or other online information written to tell others about it (published by yourself or others)." where respondents share links or whatever they want to share related to that. I just track it on the event page or in my case study. Usually isn't not really overwhelming press wise, so I can maintain it easily. You can also use that section to share what you and the GLAM did to promote.
Jaime might have some more thoughts, but, similar to what she said above, if more people are saying they are reporting about in kind donations and swag, then we'll include it in the survey next. Do let us know if you see any concerns with the verbiage being used. Thanks for your thoughts so far! SarahStierch (talk) 16:23, 25 September 2013 (UTC)[reply]
Thank you for your input, Bettina! Yes, unfortunately, program level budget tracking is one of those areas that is weak currently. However, the evaluation questions related to your suggestions around promotion as well as budget allocations and resources are definitely something that will be included in any follow-up inquiry into any highly efficient or impactful programs that we identify through this initial process. Further, we are still considering whether including at least an indicator of budget categories may still be manageable at this phase (i.e., check box for in-kind donations, check box for whether awards were given) and which would be most informative for this purpose. JAnstee (WMF) (talk) 18:47, 26 September 2013 (UTC)[reply]


Hi guys! Many thanks for doing this. We have not had time to construct any detailed feedback - indeed, I haven't had time or set up a proper meeting about this at WMUK. As a result, this is a hastily worded feedback note to say that I personally have a few concerns:

  • You've asked for feedback on this report with a short turnaround time - just two days for feedback, right in the middle of the work we're doing for the FDC bid. We at Wikimedia UK really want to give some detailed feedback on this, but we don't have the staff time to do so if we are to get our bid in by Monday. I hope, therefore, that future feedback will have a later deadline - perhaps a fortnight, or a month. I know that this is a long turnaround, but we simply don't have the resources to write feedback at two day's notice.
  • Secondly, we've chatted about this informally in the office, and are very concerned at the length of the "Data Report Request Item Matrix by Program". The sheer amount of feedback you are requesting will mean that you will only get feedback from larger chapters on their events - a volunteer will not fill in a form of this length. I am worried that by asking for people to fill in this form, you will not only get very little completed responses, but you will actively discourage volunteers and smaller thorgs from doing anything at all. I would recommend that you limit your feedback to, at most, 25% of the current length.
  • Don't apologise for using Google Spreadsheets - they are perfect in this situation, and good metrics are much more important than shoehorning things onto wikis.
  • Try and simplify your language. Naming the spreadsheet 'Data Report Request Item Matrix by Program' rather than 'Feedback Questions' is a case in point! Row 28 in particular has extremely complex language: it scores just 19.5 on the Flesch Reading Ease test: it's about as complex to read as the Harvard Law Review. You should, I think, be aiming for a readability of about 50 or 60.

I appreciate the complexity of today's reporting standards, but the spreadsheet and this page need to be much simplified if you are to have any real community input on these... I want this to succeed as much as you guys do :-) Richard Symonds (WMUK) (talk) 14:29, 27 September 2013 (UTC)[reply]

Hi Richard! Thanks for your feedback. First, we understand it was a short period of time. We did promote it heavily to the community who is interested in evaluation and we've noticed a trend: we rarely receive feedback when we ask for it. So, that means we're doing something right, or people don't care ;) (I'm joking, sort of!) But, in the future we will allow more time. Sadly, we're also under pressure to get this survey up and out in a specific period of time so we can have it for the community to look at before the end of November, or we would have had this up earlier or at a different more laid back time. Next time we do this, since this is the first time we'll provide more feedback time. Lesson learned!
You stated that only program leaders from larger chapters will respond - that's making an assumption, something I have learned is one of the most important things not to do in evaluation. When we release this survey, and we see in the next week and a half that only large chapters have replied, then we will learn from that. But, perhaps that won't be the case. We were pretty amazed by the response to the last survey (learn more here) - we had a high reply rate with 32 chapters, three affiliates and eight individuals replying. We also learned that the majority of respondents had no current staff support in the work they were doing, meaning, voluntary. So, we'll see what the response is like post survey. But anyway.. :)
I'm always paranoid, as someone who supports free and open software about using Google Docs in the movement. Glad it's OK with you!
I agree also about the complexity of the language. We worked with multiple community members who use English as a second language to look at each question on the actual survey itself, and modified the language. I do agree the matrix thing is confusing, absolutely. It's a challenge for us to meet in the middle often, we don't want to dumb things down, but we also want to tone down the academic language. Do you have a link to an online tool that provides the score you produced? I'd love to use it in the future. Thanks for the feedback! This is the first time we've done this, so we'll see how it goes. Jaime might reply here shortly, too. SarahStierch (talk) 16:30, 27 September 2013 (UTC)[reply]
Thank you for sharing your concerns. We realize that reporting is an additional burden and have designed the survey collector so that program leaders may report on as little as one program and programming event, up to a handful. Although some may not have time to share as much in reporting at this time, we are hoping enough will be able to share data to allow us to try out these indicators through program leader self-evaluation and we might begin to surface data about the level of impact one might expect from each of these types of programs to help you all direct your programming to successfully meet your individual/chapter goals and targets. Only the most common and core targets for which program leaders reported already having tracked data are those which we have included at this time. As we move toward systematic evaluation and developing guidance on metrics, we will need to proceed through several iterations and request program leader support in testing out these metrics by actually taking the time to pull the data, we appreciate all the support we have received so far, and we are optimistic about continued cooperation as we work together to define what self-evalaution will look like for Wikimedia programs.
Thank you for catching that complex sentence (I broke it into two and that cell's text now reads at 50.4 for reading ease (although it now contains a passive sentence as well) - As Sarah said, we have had several non-native English speaking program leaders for the different programs review each of the actual items and have adjusted the language where we have identified points of confusion. Thank you for that additional flag on wording there. If you have more specific points for language clarification, please let us know.
As for the timing, faced with our current reporting and decision-making demands we are on a very tight timeline for this first round of data collection and reporting. As we move forward and establish a more consistent reporting flow, we will try to ensure more review time when we request feedback. We apologize for the brief window, but we thought a brief window is better than none, and took the risk of this complaint. Thank you for taking the time that you have. JAnstee (WMF) (talk) 18:03, 27 September 2013 (UTC)[reply]


I am wondering what is the intended application of this report? Is it listing suggestions for future metrics (and so we can choose the relevant ones that fit into our priorities and start recording now), or is it something we will be asked to complete in the coming weeks while looking retrospectively on past 12 months? Daria Cybulska (WMUK) (talk) 16:17, 27 September 2013 (UTC)[reply]

Yes, Daria, we are actually doing both. Since so many program leaders reported having collected some of these more basic data points on their last year's programs, we are trying out some of these more basic metrics (where available and people are willing to share) to (1) see how well they help to inform us about the impacts that are commonly seen in each of these programs so that we may further prioritize for future reporting guidelines (this is only a beginning and only looking at a small set of potential metrics) and (2) begin to collect some normative data so that program leaders can compare their impacts to those that are seen more generally. So these are also the actual measurements we will be requesting program leaders to report in the pending reporting request, however, that does not mean they are the final recommendations for regular reporting. There is a new blog blog about it. JAnstee (WMF) (talk) 17:46, 27 September 2013 (UTC)[reply]