Learning and Evaluation/Archive/Connect main page/Metrics Brainstorm
Discuss: Metrics Brainstorm
Metrics Brainstorm Session, WMCON 2014: Summary and Discussion
Page Summary
This report is an overview of a discussion started on Measuring Program Impact on Projects, at a brainstorm session during the Wikimedia Conference 2014 in Berlin. The goal of the session was to start a community discussion around the various metrics used in the Evaluation Reports (beta), get the community's feedback around the metrics presented in the report series specifically in terms of what the metrics tell us, what they do not tell us, and how we else we might capture other important measures of program impact. After a brief introduction workshop participants broke up into two groups; the first group brainstormed on programs that focused on producing text while a second group on programs that focus on images. After brainstorming other important impact targets for measurement, and possible ways of measuring those targets, each participant was allowed to vote on their highest priorities.
Session Results
|
|
Please visit the discussion page to share your input!
Background
During Open Thursday on April 10 at the Wikimedia Conference 2014, The Wikimedia Foundation's Program Evaluation team, part of Grantmaking's Learning and Evaluation team, hosted a brainstorm session called Measuring Program Impact on Projects to get a sense of what the reports tell us about the Wikimedia programs, what do the reports not tell us about the programs, and what else could we measure. Below is a summary of the brainstorm session and the appendix provides a complete list of all the topics covered.
Evaluation Report Metrics
The Evaluation (Beta) reports were completed in early April of 2014 and are an initial attempt to systematically assess the various program that Wikimedia organizations and individual volunteers are doing around the world. These reports use a set of metrics to get a sense of what the goals are of the programs as well as the inputs, outputs and outcomes of the various programs. At the beginning of "Measuring Program Impact on Projects," we reviewed the metrics used in the reports. Here, we present a comprehensive summary of the metrics:
Program Goals
Program Outputs
|
Program Outcomes
|
Text and Media Categories
For this brainstorm, we separated two different categories of programs; those that focus on "text" and those that focus on "media", which surfaced as the two targets for content among the programs programs analyzed in the Evaluation Reports (beta).
|
|
Brainstorm Structure
15 minutes | A review of the metrics listed above |
45 minutes | The entire group broke out into two separate rooms. In one room, programs that focused on content (i.e. text) were discussed while in the other media/image programs were discussed. Using a sticky wall, participants discussed and offered suggestions in each room. The three main categories for discussion were "What do they tell us?", "What do they not tell us?", and "How else might we measure?", where "they" refers to the metrics used in the reports. After topics and ideas were posted, participants had the opportunity to vote on the various ideas presented. |
30 minutes | Media group joined the Text group in their room, where the Text group gave an overview of the topics covered. Media group got a chance to vote on the Text group topics. This process was repeated when both groups went to the Media group's room. |
Voting Rubric
The voting process for each of the two groups was done differently:
Text programs
Each participant was allowed one of each of the following sticky dots:
|
Media programs Each participant was given two sticky dots to place on the two highest priority items, without differentiating among the colors. |
Results
Participation
While a head count was not explicitly done during the event, about 27 individuals; 13 to 15 attended the Text group while 12 or so attended the media group, not including program facilitators.
Text program priorities
The text programs "points of interest" were grouped into various themes, called "groups" in the appendices. This was done because several of the points of interest shared an overall theme. The two highest themes were Quality and "Social/Community impact." For the full details of the results, be sure to visit the appendices at the bottom of this page.
The specific points of interest that were rated the highest were:
Points of interest Measuring Quality Community Impact Needs Assessment Program Knowledge Importance of Impact |
Total points 81 35 27 6 5 |
Image program priorities
The rating system for the second group differed, however, the main priorities for this group still surfaced. They included:
Points of interest Quality (not specific enough) Use of Wikimedia images outside of projects Proportions of images used Geographic (and other) diversity, spread Number of image downloads |
Total points 9 9 5 4 3 |
Appendices
For your reference, we include below all the data and photographs we took at the event to help you understand the data collected and the results above.
Appendix Key
Category Number | - - - - - Description - - - - - |
---|---|
1 | What do they tell us? |
2 | What don't they tell us? |
3 | How else might we measure? |
For Appendix 1 (Text metrics votes) only:
Color Code | - - - Point Value - - - | ||
---|---|---|---|
Red | 3 points | ||
Yellow | 2 points | ||
Green | 1 point |
Appendix 1: Text Programs
Number | Type | Category | Points of Interest | Group | Group (desc) | Red | Yellow | Green | Total points |
---|---|---|---|---|---|---|---|---|---|
13 | Text | 2 | Quality (and Return on Investment (RoI) in terms of quality) | 2 | Quality (Overall) | 9 | 3 | 2 | 35 |
18 | Text | 2 | Social/community impact | 3 | Community | 5 | 0 | 5 | 20 |
34 | Text | 3 | Needs assessment | 7 | Needs Assessment | 0 | 8 | 3 | 19 |
40 | Text | 3 | Pairing qualitative and quantitative information | 0 | Quality (Data) | 3 | 2 | 2 | 15 |
24 | Text | 2 | No offline volunteers included | 0 | Community | 0 | 2 | 4 | 8 |
25 | Text | 2 | Gender & diversity of participants | 0 | Quality (Diversity) | 1 | 1 | 3 | 8 |
33 | Text | 3 | Identify gaps (e.g., missing articles by unsuccessful search) | 7 | Needs Assessment | 1 | 1 | 3 | 8 |
35 | Text | 3 | Software that measures the complexity of language, readability | 0 | Quality (Readability) | 1 | 2 | 1 | 8 |
39 | Text | 3 | Talk page analysis | 0 | Quality (Engagement) | 0 | 3 | 1 | 7 |
26 | Text | 2 | How to improve processes | 4 | Process | 2 | 0 | 0 | 6 |
37 | Text | 3 | Number of references in certain articles | 6 | Quality (Sources) | 1 | 1 | 0 | 5 |
20 | Text | 2 | Community building | 3 | Community | 1 | 0 | 1 | 4 |
14 | Text | 2 | Quality assessment is limited (What does # of bytes tell us?) | 2 | Quality (Content) | 1 | 0 | 0 | 3 |
19 | Text | 2 | Fun or "engaged" | 3 | Community | 1 | 0 | 0 | 3 |
29 | Text | 3 | Pre/post survey on projects | 0 | Documenting | 0 | 1 | 1 | 3 |
8 | Text | 2 | Importance of articles (views) | 1 | Importance | 0 | 1 | 0 | 2 |
38 | Text | 3 | Encourage event documentation (record video like Wikipedia Zero) | 0 | Documenting | 0 | 1 | 0 | 2 |
1 | Text | 1 | Raw number increases | 0 | Documenting | 0 | 0 | 0 | 0 |
2 | Text | 1 | Sets a comparison point (baseline) | 0 | Documenting | 0 | 0 | 0 | 0 |
3 | Text | 1 | Program resource inputs lead to edits | 0 | Program Knowledge | 0 | 0 | 0 | 0 |
4 | Text | 1 | Easier to collect data in a broader sense | 0 | Program Knowledge | 0 | 0 | 0 | 0 |
5 | Text | 1 | Gives us relative proportion of people that make contribution | 0 | Program Knowledge | 0 | 0 | 0 | 0 |
6 | Text | 2 | Importance of changes made (grammar vs. important content) | 1 | Importance | 0 | 0 | 0 | 0 |
7 | Text | 2 | Hard to see the categories of the articles (do they fill a gap?) | 1 | Importance | 0 | 0 | 0 | 0 |
9 | Text | 2 | Topics covered/diversity | 1 | Importance | 0 | 0 | 0 | 0 |
10 | Text | 2 | How a new participant learned from articles created or improved | 1 | Importance | 0 | 0 | 0 | 0 |
11 | Text | 2 | Level of article reorganization | 1 | Importance | 0 | 0 | 0 | 0 |
12 | Text | 2 | Quality of articles (most won't be good/featured) | 2 | Quality (Content) | 0 | 0 | 0 | 0 |
15 | Text | 2 | Reverts and deletions (after a workshop) | 2 | Quality (Content) | 0 | 0 | 0 | 0 |
16 | Text | 2 | Number decreases can be a good thing | 2 | Quality (Content) | 0 | 0 | 0 | 0 |
17 | Text | 2 | Quality of edits made (Are they adding goods tuff or just junk) | 2 | Quality (Content) | 0 | 0 | 0 | 0 |
21 | Text | 2 | Do they feel welcome | 3 | Community | 0 | 0 | 0 | 0 |
22 | Text | 2 | Do people stay/increase their activity? | 3 | Community | 0 | 0 | 0 | 0 |
23 | Text | 2 | Relative to %-i.e. 5 new active editors but workshop had 50,000 attendees | 3 | Community | 0 | 0 | 0 | 0 |
27 | Text | 2 | How to tweak our programs to get more active edits/editors | 4 | Program Knowledge | 0 | 0 | 0 | 0 |
28 | Text | 2 | Scaleability | 0 | Program Knowledge | 0 | 0 | 0 | 0 |
30 | Text | 3 | Measure quality by traffic to articles? | 5 | Quality (Reach) | 0 | 0 | 0 | 0 |
31 | Text | 3 | Changes in traffic | 5 | Quality (Reach) | 0 | 0 | 0 | 0 |
32 | Text | 3 | Cross-reference information - intrawiki pagerank project (wikirank) | 6 | Quality (Reach) | 0 | 0 | 0 | 0 |
36 | Text | 3 | Gather numbers to be retweeted/facebook posted, google plus of certain articles | 0 | Needs Assessment | 0 | 0 | 0 | 0 |
41 | Text | 3 | "Re-use of incoming links (like page rank) and outgoing links | 0 | Needs Assessment | 0 | 0 | 0 | 0 |
Appendix 2: Media Programs
Number | Type | Category | Points of interest | Total points |
---|---|---|---|---|
42 | Media | 1 | Categorizing makes it easy to count (and show) | 1 |
43 | Media | 1 | Proportions of images used | 5 |
44 | Media | 1 | Use of images on projects | 0 |
45 | Media | 1 | Number contributors (new and active) | 0 |
46 | Media | 1 | Additional materials for project | 0 |
47 | Media | 2 | Number new instances of an image (e.g. new monuments with images after WLM) | 0 |
48 | Media | 2 | Number views | 2 |
49 | Media | 2 | Average number images per contributor | 0 |
50 | Media | 2 | Usage of images outside of Wikimedia projects | 9 |
51 | Media | 2 | Geographic (and other) diversity, spread | 4 |
52 | Media | 2 | Number downloads | 3 |
53 | Media | 2 | Quality (not specific enough) | 9 |
54 | Media | 2 | Negative impact (i.e. images that take up volunteer time to delete) | 3 |
55 | Media | 2 | Contributions from museums | 1 |
56 | Media | 2 | amount of coverage in national news | 0 |
57 | Media | 3 | Quality - use jury ratings | 0 |
58 | Media | 3 | Use of knowledge materials | 2 |
59 | Media | 3 | GLAMorous tool to include ability to search by timeframe | 3 |
60 | Media | 3 | Split up "Quality" | 2 |