Grants:IdeaLab/Redesigning Global Metrics & its support/Outcome
- 1 Background
- 2 Executive Summary
- 2.1 Grant metrics instead of Global Metrics
- 2.2 Timeline
- 3 Next steps
- 4 Details on community feedback and rationale behind changes
- 5 Appendix
This project was undertaken jointly by the Community Resources and Learning & Evaluation teams at the Wikimedia Foundation, as both of these teams have played a central role in the creation, implementation and support of Global Metrics over the last two years.
The goal of updating Global Metrics was to create something that was responsive to copious feedback and suggestions we received through the Global Metrics retrospective and this consultation.
Given the spectrum of feedback, we identified a set of upfront principles that would enable us to create ideas and make decisions that were both responsive and consistent. In the proposed set of changes, these were the set of design principles that were used to assess the strengths and weaknesses of each idea proposed. In deciding the final updates to Global Metrics by synthesizing and incorporating feedback, these were principles that we used to guide our decision making:
- We would not introduce new metrics if there were no easy-to-use tools available. Agreed upon new metrics would only be introduced once tools were available.
- We would identify places where we could iterate, understanding that not everything can be done now, but it’s better to do it well even if it’s done slowly.
- We would think holistically about metrics, acknowledging that grant metrics, organizational metrics, and program metrics might overlap, but aren’t necessarily the same thing. All of these metrics need to complement not supersede each other.
- The Metrics Library received strong support from the grantees and grant committee members who gave feedback. The creation of this centralized resource will become one of the core projects for Amanda Bittaker from Learning & Evaluation in the 2016-2017 fiscal year.
- There was no consensus from respondents on whether Wikimetrics should become the single tool to calculate Global Metrics. Given Wikimetrics has significant front-end and back-end problems, a deeper project will be undertaken by Sati Houston from Community Resources in the 2016-17 fiscal year to investigate whether a new tool or improving an existing tool is the best solution to the issues around data collection.
- There was strong support from respondents to simplify the requirement of Global Metrics, particularly to change the structure to “Proposal 2: 3 shared measures + 2 grantee-defined measures”.
- For the four metrics proposed, respondents provided significant feedback on the importance, usefulness, and definition of each metric, as well as how easy/difficult it would be to collect the information. This feedback was integrated into the final set of metrics, with risks and concerns offered potential mitigations. For additional information see the section on #Participation, #Content, and #Community Building.
- All respondents indicated Community Building was important and useful outcome to capture, but had hugely divergent responses to the question, “What is community building”?. Moreover, respondents indicated that collecting information on Community Building would be difficult; it was unclear if surveys, storytelling, or some other solution was the right fit for capturing this information. As such, Community Building will not be included as a shared measure at this time but we hope to learn more about successes and challenges in this area throughout the upcoming year.
- Update Collecting Global Metrics learning pattern - August 12th
- Update grant proposal and reporting templates - Will vary by grant program, according to the dates the grant rounds start.
- Begin scoping the Metrics Library project - August 15th
- Begin scoping the tool for shared metrics project - September 5th
Details on community feedback and rationale behind changes
Overall, most grantees and grant committee members interviewed prefered “Proposal 2: 3 shared measures + 2 grantee-defined measures”. This structure integrates Global Metrics and other metrics into a single set of grant metrics. Those who favored this structure indicated that it adequately balanced the need for consistency by having a set of shared metrics, and also the need for flexibility by allowing grantees to highlight their specific outcomes and achievements. However, the overall success of this proposed new structure depends heavily on the Metrics Library, as the library will be the main mechanism for grantees to discover new metrics, online resources, and potential tools.
There were three primary concerns raised about shifting to structure that integrated Global Metrics and other grant metrics. In an effort to address these concerns, we have identified a few potential mitigations where applicable:
- This new structure will diminish the distinction between those metrics required by WMF, and those identified by the grantee.
- It is true that removing the this structure will integrate metrics required by WMF and those other metrics identified by the grantee. However, the respondents who supported Proposal 2 highlighted this as a strength of the structure, as all of these metrics together represent the outcomes of a grant.
- Requiring at least two grantee-defined metrics might increase the burden on smaller grants and grantees.
- Mitigation: The Rapid Grant program is designed reduce the grant approval time and the overall burden of reporting, and will not follow this same structure. It will instead maintain its current lighter weight requirements.
- Given these grantee-defined metrics will vary by grant, there may not be adequate support (e.g. tools, online resources) available to the grantee.
- Mitigation: The Learning & Evaluation team has spent the last three years documenting common program metrics through program toolkits and learning patterns, which includes some documentation on available tools. These resources will be integrated into the Metrics Library before its launch, to help with identifying, calculating, and reporting “other grant metrics”. However, a longer-term solution is still needed to ensure that other grantees can contribute their own metrics to the Metrics Library (as well as the associated tools and resources), and to ensure that common metrics are easily identified.
Based on this decision to bring Global Metrics and other metrics into a single set of “grant metrics”, updates will be made to the proposal and reporting templates for Project and Annual Plan Grants to facilitate the integration. Template changes will also aim to make clear that these three shared metrics should only be required when relevant to the goals of the grant.
Of the two participation metrics proposed (Individuals Involved and Editors Retained) respondents indicated that both metrics were useful and important, but covered different outcomes and were applicable in different situations. While there was support for having only Individuals Involved as the primary participation metric, there was equal support for allowing grantees to choose between Individuals Involved and Retention, as applicable to their grant (voting data).
Respondents also indicated that the definition of each proposed metric needed to be refined, with the following being the primary changes suggested to the definition:
- For Individuals Involved, the definition requires more specificity. While there are many groups of people “involved” in a grant, there are three main groups that are typically included in this metric:
- Participants - these are people who attend events, either in person or virtually. For instance, people whose usernames are collected during the program are participants.
- Organizers - these are the volunteers or staff members who are responsible for organizing the event.
- Large audiences that are the target of outreach activities - these are the large groups of people who are reached (primarily) through mass communication, where the goal is to raise awareness about the Wikimedia movement, a specific project or event, etc. For example these are newsletter recipients, social media followers, mailing list members, those who visit a Wikimedia booth at a fair, etc.
- While respondents indicated that the distinction between these groups isn’t perfect or comprehensive (e.g., Where would donors or partner organizations be captured? What about participants who become organizers?), the disaggregation of these three groups would improve the specificity of the definition, and be more useful overall. Two respondents went further to add that the number of volunteer organizers is really a measure of “volunteer engagement” and the size of those large outreach audiences is really measure of “raising awareness”; while both are related to participation generally, they actually measure different outcomes than “total participants”.
- For Editors Retained, more respondents indicated the metric should instead focus on New Users and New User Retention. Respondents supported expanding the definition of retention to 1 edit in any namespace in any Wikimedia project, but indicated that:
- To measure retention, it’s important to know the total number of new users.
- To measure retention well, it must be tracked over multiple time periods (e.g. 1, 3, 6, or 12 months).
- To know what qualifies as “good” retention, there need to be baselines available.
- Respondents indicated that while the engagement of the existing contributors is critical for many activities, the outcomes of those activities are better captured qualitatively - focusing on feelings of motivation, connectedness to their community, etc. - rather than through a numeric retention metric.
However, while all of these participation metrics have issues around data collection, New User Retention has significant issues. To our knowledge, no tool exists with sufficient functionality to capture retention over multiple time periods, with the suggested definition. Moreover, Single User Login was introduced after the original Global Metrics were defined, and it will likely be necessary to change the way “New User” is technically defined in each tool (e.g. distinguishing between “those who are creating a username for the first time” and “those who are existing contributors, but new to a specific language project”), as well as how “New User Retention” is technically defined. While there are many retention-focused tools - both created by WMF and by volunteers - each would need small to large improvements. (This is based on tools that we know of; if we have missed a tool, please let us know!). As such, the current “Newly registered user” metric should remain until tools can be updated to support new definitions and functionality.
Most respondents strongly endorsed the proposed content metric: Content pages created or improved, disaggregated by Wikimedia project.
This new definition resolves some of the issues identified in the retrospective, particularly that the current “pages new or improved” metric was interpreted as “Wikipedia articles new or improved”, or as aggregated pages across Wikimedia projects (e.g. Wikidata items, Wiktionary entries and Wikisource pages). Respondents did not have many suggested improvements to the definition but they did have three concerns. In an effort to address these concerns, we have again identified a few potential mitigations where applicable:
- The quality of content is not addressed.
- Given quality is currently primarily assessed through community processes (e.g. Featured Article process on English Wikipedia, Featured Image process on Commons), or specific processes & rubrics of an event or contest, it is highly contextualized and would be difficult to define in a centralized way. While the adoption of automated tools might be an option in the future, for now it isn’t feasible to include as a shared metric.
- This metric might be easily “gamed” or manipulated, e.g. through creating stub articles or making small edits to as many pages as possible.
- While this is a valid concern, it is unlikely that the grant system will ever have sufficient oversight to ensure gaming doesn't happen.
- The definition of an “improvement” differs by Wikimedia project.
- Potential mitigation: More detailed examples of improved could be added to the metric documentation in the Metrics Library.
While this disaggregated content metric provides more specificity in content added or improved across the various Wikimedia projects, there are no available tools to collect this detailed information currently (without having some technical expertise in languages such as SQL). Wikimetrics can report “content pages added, disaggregated by Wikimedia project” and “aggregated content pages created and improved”, but not “content pages improved, disaggregated by Wikimedia project”. (Again, the assessment that no tools currently exists is based on tools that we know of; if we have missed a tool, please (let us know!)
Given this tool limitation, this metric cannot become a shared grant metric yet. As such, the current “pages created or improved” metric will remain, with slight improvements to the definition. Once the tool issues are addressed, the updated metric (with the disaggregation by Wikimedia project) will replace the current one.
Respondents strongly indicated Community Building was important and useful outcome to capture, but had hugely divergent responses to the question, “What is community building”?. Moreover, respondents indicated that collecting information on Community Building would be difficult; it was unclear if surveys, storytelling, or some other solution was the right fit for capturing this information.
As such, Community Building will not be included as a shared measure. While important, it needs more investigation and experimentation, to clarify its definition, how the information would be collected, how the information would be used.
However, we will continue to investigate the work that has already been done or is currently being done around Community Building, including currently used definitions. Depending on interest of conference organizers, we could present these findings at a movement conference in 2017.
|This page is under construction|
Please help review and edit this page.
Demographics of respondents
The summarized feedback below is based on interviews and survey data collected from former and current grantees, WMF staffs, grant committee members, and a WMF board member.
- 5 WMF staff from the Community Resources team
- 34 grantees
- IEG: 4
- PEG: 14
- APG: 10
- Simple APG: 4
- Unknown: 2
- 8 Grant committee members
- FDC members: 3
- GAC members: 3
- SAPG committee members: 2
|Reporting||Need to update the forms to make sure that the relevancy question is clearly answered - i.e. when are these relevant and when not|
|Reporting||The grant framework is not conducive to capturing longer-term outcomes; it's more conducive to capturing the shorter-term outputs; given the clear feedback Grant got in the need to simplify reporting, grant reports are not the right medium to report these longer term outcomes.|
|Simplification||This update simplifies collecting Global Metrics|
|Simplification||Will make comparisons more difficult|
|Simplification||Less information about understanding impact|
|Simplification||There is a loss of nuance in the proposed solutions|
|Simplification||Quantitative metrics create quantitative bias|
|General||Need to communicate how WMF is using this information more broadly|
|Proposal 1||Endorsements for Proposal 1||6|
|Proposal 1||Other metrics will be included anyways; other metrics shouldn't be required||3|
|Proposal 1||Keeps the distinction between Global Metrics and other metrics||2|
|Proposal 1||Proposal 2 may be too much for smaller communities, unless the metrics are clearly defined and have identified tools||1|
|Proposal 2||Endorsements for Proposal 2||16|
|Proposal 2||Gives organizations flexibility/freedom to pick and choose which outcomes are most relevant to their work||7|
|Proposal 2||Need to start looking at those metrics beyond Global Metrics, given diversity of programs||1|
|Proposal 2||Good combination of cross-cutting metrics, uniform data-gathering, and other/local metrics that are diverse||2|
|Proposal 2||Good balance between consistency and flexibility||1|
|Proposal 2||Gives a chance to acknowledge things/outcomes a grantee thinks brought value||1|
|Proposal 2||The opportunity to see what others are measuring (i.e. sharing) could lead to the opportunity to collaborate / build together new measures||2|
|Proposal 2||Provides space for those measures of success that the grantee has already identified||2|
|Proposal 2||Provides space for those locally relevant challenges and achievements||3|
|Proposal 2||Going to collect other metrics whether it is required by WMF or not||1|
|Proposal 2||Would be difficult for the "extra" metrics to be cross program ones; these would be difficult to identify and collect||2|
|Proposal 2||Will not be able to aggregate these "extra" metrics||1|
|Proposal 2||"Other" metrics need to be consistent year to year||1|
|Proposal 2||Dependent on the breadth and rollout of the Metrics Library||1|
|General||Would be difficult (and additional burden) to report metrics by programme||1|
|General||Need to be clear in the update that a low number won't be held against a grantee||1|
|General||Need to emphasize the importance of other metrics and that “Global Metrics + Other metrics” is what tells the full story||1|
|General||Defining a broader set of metrics, with good definitions and tools would also be an alternate solution||1|
|General||Outreach events have very different metrics - press mentions, social media, people actively reaching out to partners||1|
|General||Having only 3 standard metrics is good as long as the combination of shared and other metrics is sufficient to "measure impact and enable people to learn from success/mistakes"||1|
|General||"Proposal 1 allows for homogeneity, which is a good for mapping outcomes, but Proposal 2 allows us to gain insights into challenges and conditions of grantees that will have a huge impact on the movement."||1|
|General||Need to test these metrics & structure and incrementally develop further over the years||1|
|Individuals Involved||General||Endorsements for Individuals Involved (over Retention)||9|
|Individuals Involved||Positive||Metric reflects grantee goals, better than other metrics||1|
|Individuals Involved||Positive||Main metric for activity||3|
|Individuals Involved||Positive||Important metric for those events that only happen once a year (e.g. Art+Feminism, WLM)||2|
|Individuals Involved||Positive||Even though the definition of this metric is open and can easily change or be interpreted differently, it gives a good sense of reach||1|
|Individuals Involved||Positive||Able to share this externally, beyond the movement||1|
|Individuals Involved||Positive||Only way currently to capture offline activity||1|
|Individuals Involved||Positive||Easier to collect than retention||1|
|Individuals Involved||Positive||Individuals Involved is a useful metric||8|
|Individuals Involved||Concerns||Grantee anxiety when the number is small||1|
|Individuals Involved||Concerns||Unclear why WMF is interested in this metric / definition overall||1|
|Individuals Involved||Concerns||Easy to maximize this number||1|
|Individuals Involved||Concerns||Individuals Involved is not a useful metric||3|
|Individuals Involved||Concerns||Privacy concerns make tracking difficult||1|
|Individuals Involved||Concerns||Manual tracking is the only option||2|
|Individuals Involved||Concerns||Redundancy between different sign-in, sign-up mechanisms||1|
|Retention||General||Endorsements for Retention (over Individuals Involved)||4|
|Retention||Positive||1 edit threshold is good||1|
|Retention||Positive||Good to include any project||1|
|Retention||Positive||Good to include any namespace||1|
|Retention||Positive||“The definition is fine"||2|
|Retention||Positive||Good metric for longer-term outcomes||2|
|Retention||Positive||Retention is a useful metric||2|
|Retention||Concerns||Retention is not a goal for every activity||1|
|Retention||Concerns||Very online, editing focused||3|
|Retention||Concerns||Not a good fit for those one time contest participants, outreach||1|
|Retention||Concerns||Much more limited metric||1|
|Retention||Concerns||Is 1 edit meaningful? At what point does the number of edits become meaningful?||1|
|Retention||Concerns||A new editor retention metric will not capture activities focused on existing editor community||1|
|Retention||Concerns||Capturing both new and existing retention is important||1|
|Retention||Concerns||Selecting 30/90/12 months will not fit time periods like semesters well||1|
|Retention||Concerns||Retention is not sufficient to be the one metric||1|
|Retention||Definition||Existing editor retention is not useful||2|
|Retention||Definition||Should focus on new editor retention||2|
|Retention||Definition||Existing editor retention might be addressed through community building question||1|
|Retention||Definition||Need flexibility in the retention period||4|
|Retention||Definition||Grantee needs to define their retention period beforehand||1|
|Retention||Definition||Needs a great baseline, to contextualize "good" retention||3|
|Retention||Definition||Time periods might be set to fit the chapters reporting||1|
|Retention||Concerns||Short term retention isn't useful to track||1|
|Retention||Concerns||30 day retention isn't going to be useful to all grantees||1|
|Retention||Collection||Time intensive to collect and track||1|
|Retention||Collection||Need an automated system that tracks the retention of users||1|
|Retention||Collection||No system can capture the "retention" of volunteer organizers||1|
|Participation||General||Both Individuals Involved and Retention are necessary to see the full picture||1|
|Participation||General||Captures both output and outcomes||1|
|Participation||General||Allows for flexibility, given the different types of activities run||1|
|Content||Positive||Endorsements for the Content metric||13|
|Content||Positive||Disaggregation by Wikimedia project is good||7|
|Content||Positive||Information on content by project is useful||9|
|Content||Concerns||Doesn't capture everything related to content||1|
|Content||Concerns||Could be manipulated, e.g. by stubs||2|
|Content||Concerns||Number of "Pages improved" is not a good measure of quality||2|
|Content||Concerns||Need a measure of quality||1|
|Content||General||Make clear "project" is a Wikimedia project and not something else (e.g. WikiProject, or a funded project)||1|
|Content||General||Definition should have more detail - Wikipedia articles translated, Wikisource books proofread twice, Wikidata statements created||1|
|Content||General||Need to be clear what an "improvement" means - would vary by wiki project||1|
|Community Building||Collection||Survey community being served||3|
|Community Building||Collection||Measure engagement after an event||1|
|Community Building||Collection||Measure active editors||1|
|Community Building||Collection||Measure number of participants in community||1|
|Community Building||Collection||Answer a set of questions, to assess various dimensions||1|
|Community Building||Collection||Look for indicators, not direct causality||1|
|Community Building||Collection||Measure before and after the program, not a specific event, but the entire program||1|
|Community Building||Concerns||Cannot yet be systematically captured||1|
|Community Building||Concerns||Really hard to capture, particularly the qualitative side||5|
|Community Building||Concerns||Making the question specific would make it less applicable||1|
|Community Building||Concerns||Should not make it too resource intensive to collect and evaluate||1|
|Community Building||Concerns||Community Building might not be an outcome, but a prerequisite of the program||1|
|Community Building||Concerns||Proving causality will be difficult||2|
|Community Building||Concerns||Will be difficult to automate this data collection||1|
|Community Building||Concerns||Ambiguous definition||1|
|Community Building||General||Community Building is useful information||8|
|Community Building||General||Community Building is important information||15|
|General||General||Need to think about how the information will be collected in the field||1|
|General||General||Metrics are harder for things that are not timebound||1|
|General||General||Grantee needs to have a clear definition and be clear about why this metric is important to them||1|
|General||General||Grantee needs to demonstrate consistency between the goal and what they are measuring||1|
|General||General||WMF needs to address anxiety about low numbers||1|
|General||General||Global metrics show how program are working around the globe; program specific metrics are different||1|
|General||General||Should have metrics that are applicable from the smallest to largest grant; the comparison is still interesting||1|
|Metrics Library||Endorsement for Metrics Library||13|
|Metrics Library||Suggestions for features for the Metrics Library||16|
|1:1 Support||Create a position to help program managers do program evaluation||1|
|Tutorials||More online training about using and understanding Wikimetrics could be something to think about.||1|
|Tutorials||An online tutorial/masterclass one per month would be a nice way to resolve doubts and questions.||1|
|Current resources||Learning patterns and Idea Lab insufficient to inspire experimentation and new program design. So the metrics library seems like a good addition.||1|
|Current resources||Maybe there is already sufficient guidance but it is difficult to find. Not all grantees (even "experienced" grantees) knew about it.||1|