User:The Land/Thinking about the impact of the Wikimedia movement
|This is an essay. It expresses the opinions and ideas of some Wikimedians but may not have wide support. This is not policy on Meta, but it may be a policy or guideline on other Wikimedia projects. Feel free to update this page as needed, or use the discussion page to propose major changes.|
The Wikimedia movement does not often think about how to measure or describe our impact.
Consider the following different scenarios:
- An article about the life of a celebrity, written based on extensive media coverage in the same language, attracts millions of pageviews when the celebrity dies
- Someone uploads an image of a cultural artifact that is very interesting to scholars of a particular period of history, but almost no-one is interested in outside of academia
- The Wikipedia page on a political event in a small developing country becomes the best source of unbiased information on that event available in that country's language
In each of these cases the Wikimedia vision of providing the sum of human knowledge to everyone (for free) is fulfilled. Therefore they all represent part of the impact of the Wikimedia projects. However, they differ in the nature of the knowledge that is being transmitted, and also the number of people who receive it and the context in which they read it and use it. However, the Wikimedia movement almost never engages in conversations about which of these scenarios is more important, or indeed whether they are all equally important.
The lack of discussion of impact is a challenge to the development of movement strategy, and to the WMF's role as a grantmaker. How can the WMF decide what goals to focus without an understanding of what's important? How is the Funds Dissemination Committee meant to prioritise spending between dozens of non-comparable projects from dozens of different organisations? Without explicit answers (or even conversations) about the subject, some implicit answers have grown up and can be found among WMF's existing priorities, metrics and grantmaking criteria.
This essay aims to set out and explore those implicit definitions of impact, and others we should bear in mind, in a reasonably systematic way, as a starting point for further discussion. This issue isn't easy - there are too many differences of philosophy, culture, and standpoint in the world for anyone to find simple answers. However, the discussion itself can be enlightening.
- 1 TLDR version
- 2 Where we're at and why
- 3 What should we measure?
- 3.1 Primary impact - characteristics of content
- 3.2 Secondary impact - communities and technology
- 4 Further Reading
- 5 Notes
To date, the Wikimedia movement has developed metrics (and views of impact) from two directions: off-wiki looking at big picture numbers to assess the overall performance of the WMF or affiliates, and also on-wiki looking at definitions or article quality (and sometimes importance). These two pictures rarely coincide.
The amount and reach of the content on Wikimedia projects is vital to understanding impact as is the quality and importance of that content. Arguably we should also consider the diversity of topics and perspectives in that content, and perhaps also the uniqueness of the content in terms of providing a resource that is not otherwise available.
Secondarily, the things that make our content (and thus impact) possible are: volunteers, technology, partnerships, reputation and public policy. Working to support, increase or develop any of those is also likely to improve our impact.
Where we're at and why
To date, there have been two ways to look at measuring the impact of what we do, worked on from two different directions: a project-led approach to assigning quality and importance scores to individual articles (though very inconsistently across projects), and a WMF-led approach of measuring broad-scale statistics. These two approaches have only rarely met in the middle.
The WMF has typically tracked the "big numbers" like number of editors contributing and number of pageviews across projects, e.g. in quarterly reports or through the project statistics. These are at least an attempt to look at the aggregate impact of the Wikimedia projects, and some of them (e.g. editor numbers) are also used to monitor the health of the communities that build them. The WMF has also attempted to make these figures useful in the context of grantmaking, through the global metrics, which essentially measure how successful WMF affiliates are at contributing to those very big-picture numbers. This proved controversial among the affiliates who apply for FDC grants, who have their own measures of success for the projects they are working on - which has resulted in the WMF grantmaking process formally acknowledging that by inviting affiliates to set their own criteria for success.
On Wikimedia projects, project communities have developed various tools to assess the quality and/or importance of a particular article or image. "Featured" articles or images that represent the very best work on that particular project. Very new and short articles are classified as "stubs". The English Wikipedia takes things a step further with a rating system that rates articles on a 7-point quality scale and a 4-point importance scale. The quality criteria are consistent across all areas, though they may be inconsistently applied as only the higher quality levels require detailed community scrutiny. The "importance" scale is considerably weaker, in my view, as importance ratings are assigned by a WikiProject working in a particular area. An article identified as "top importance" in the domains of Dungeons & Dragons or Beekeeping is unlikely to be as important as a "top importance" article in the domains of English Literature or Biology, for instance.
The English Wikipedia's quality and importance systems grew out of the planned "Wikipedia 1.0", an offline version of Wikipedia that would include the best-quality and most-important articles. The actual selection of articles for offline English Wikipedia releases was based in part on the on-wiki quality and importance ratings, and also in part on the number of pageviews for and internal links to the article.. This system can be used to calibrate a machine learning tool which assess the quality of edits made, the Objective Revision Evaluation Service - though principally as a method of helping volunteers review edits, rather than a method of measuring impact.
What should we measure?
Ultimately, impact is about the knowledge we are sharing, and the uses people make of that knowledge. As a result, the best measures of our impact will be about the amount and nature of the content on the Wikimedia projects, and the number of people who make use of that content, and what it means for them.
Secondarily, we care about things that make it possible to maintain and extend that impact. Thriving communities, effective technology to create and distribute content and (increasingly) institutional partnerships all fall into this category of things which result in us having impact.
Of course, neither of these areas is easy or obvious to describe, let alone measure - because of a tonne of practical, philosophical and cultural difficulties. As a result, this essay has more questions than answers. But here are my suggestions (most of which are already used, either explicitly or implicitly)
Primary impact - characteristics of content
Amount of content
One of the simplest measures of impact is "how big are the Wikipedia projects". We can usually agree that starting a new article, extending an existing one, or uploading an image is a useful contribution. Many of the big numbers we celebrate, like Wikipedias reaching a million or five million articles, are based on size. So are half of the WMF's global metrics.
In spite of its relative popularity, quantity is an obviously crude metric. I don't think anyone seriously believes that the Wikipedia article on Donald Trump is exactly as worthwhile as that on Minnesota State Highway 371, for all kinds of reasons.
Another issue with quantity of articles or bytes is that a particular new contribution might not be helpful - but we're pretty sure that poor-quality material is usually promptly removed, so lasting size and number of contributions might be a better measure. (Though projects or initiatives which result in large numbers of low-quality contributions do create a burden on volunteer time for cleanup work, and in some circumstances might be a net negative).
Another fairly obvious measurement of our impact is reach: how many people are viewing, or acting on, our content? Clearly, if 2 billion people are accessing the Wikimedia projects then that's much better than 1 billion, or 1 million, or 1,000 - and again, reach often features in WMF metrics reports and presentations.
One could argue that, all other things equal, adding material to a high-traffic article is more impactful than adding a similar amount (and quality) of material to an article that is hardly every viewed. Indeed, reach is also sometimes used as a proxy for "article importance", which seems like a logical extension of the same idea - for instance pageviews was a part of the scoring system by which Wikipedia 1.0 articles were selected.
However, the more one looks at reach, the more problematic a measure it seems. Looking at English Wikipedia traffic reports like this one, it often appears that the most popular articles each week are politicians or celebrities who have done something controversial, died, or both. If high-reach contributions are particularly important, then that would suggest that the movement should focus its resources on celebrity articles, These are probably not also the most important subjects in the world, just as "The sum of human knowledge" is not the same concept as "the sum of what everyone is googling today".
One final observation about reach: Reach is fundamentally linked to language. English has 942 million speakers while Swedish has 8.7 million. Going by the numbers, that would suggest that an article to the English Wikipedia has over 100x the potential impact of a similar article on the same subject in Swedish. Would we really be happy to draw that conclusion? Should WMF extensively fund projects working on the few biggest Wikipedias, starting with Chinese, and neglect languages with only a couple of million or fewer readers?
If one accepts that the articles on Donald Trump aren't entirely equivalent in value to that on Minnesota State Highway 371, it's probably because of importance. It isn't difficult to agree that the current President of the United States is more important than an undistinguished 100-mile road. Adding verifiable facts on what President Trump is doing adds more to our projects' impact than adding further information on the nature of the tarmac on that road.
However, I suspect it's difficult to agree on anything more than that. One can probably find a consensus that certain topics are important, but after the first couple of hundred articles I suspect that consensus will very quickly unravel, replaced with purely subjective argument. Who is to say whether Beyonce is more important than Battleship than Basal cell carcinoma, or even Bulbasaur? Is an extensive list of "articles by importance" really possible to produce without a whole lot of cultural baggage attached?
Diversity and content gaps
The Wikimedia projects have a set of biases which reflect the mindset of the predominantly well-educated, male, white and Western communities that produce them.
Are contributions that aim to address these biases particularly valuable? For instance, we have remarkably good coverage of battleships, and very poor coverage of feminist cinema. Does this mean that a new article of 500 words on a relatively obscure film only of interest to connoisseurs of feminist cinema is more valuable than a new article, just as long, on a relatively obscure battleship only of interest to true devotees of military history? Currently the WMF's answer seems to be "yes", either implicitly or explicitly, in its grantmaking process. However, if the answer is "yes", does that also extend to projects in languages that are poorly represented currently? What methods can we use to identify and fill in such gaps?
Number of other resources accessible on a subject
Is a Wikipedia article more valuable if it is the only serious resource on a particular subject (or at least, the only serious one online?) In the early days of the English Wikipedia we frequently referred to Execution by Elephant as a good example of a subject that only Wikipedia covered. However, if there were a dozen similar online resources on the same subject, would the impact of that article be any the less?
Or, alternatively, is Wikipedia content particularly valuable when there is a vast quantity of coverage of a subject? For instance, there is a vast quantity of comment and analysis about Donald Trump, much of it shared frenetically on social media. Virtually all of it is politically contested, and a certain proportion of it is politically-motivated fiction produced by one extreme or the other. Are Wikipedia articles particularly valuable when there is so much material available about them available on the internet that it's difficult to where to know where to look for objective, neutral commentary?
Similarly, dealing with media files, do we value a media upload to Commons any less if the same file was already available online in similar quality and format? That is to say, does copying files across from Flickr have the same impact as uploading them directly?
Quality of content
What does "quality content" mean on Wikimedia projects? Actually, this is one of the areas where on-wiki processes are best equipped to give an answer, at least a partial one. Looking at the English Wikipedia Featured Article Criteria we see the answer (at least, for that Wikipedia) is that a high-quality article is well-written, comprehensive, and well-researched (with citations in support). It follows that improving the breadth and depth, the prose, and the referencing of an article improves the quality.
"Quality" seems to appear infrequently in grantmaking discussions, and is almost never referred to by WMF. The only reason I can think of for this is because on-wiki quality recognition processes are community-driven and entirely outside the control of the WMF or chapters, meaning these bodies haven't set targets for things they can't control. Nonetheless, this seems like the biggest missing piece of the picture to me.
What are the limitations of quality as a criterion? The main issue with the current community-driven quality review process is that it is far easier to write a "high-quality" article on a narrow topic. The perfect topic on which to write a Featured Article is small, focused, uncontroversial, and can only be approached in one intellectual discipline. There will only be a certain number of authoritative secondary sources, and it's possible to consult all or most of them and distil them down into a great article. By contrast, sprawling topics that require multidisciplinary approaches are far more difficult to move even slightly up the on-wiki quality ladder, because it could take years of study to master even part of the subject. It's no accident that USS Iowa has not one but two Featured Articles on English Wikipedia, while the much more general topic of War languishes at C-class.
Secondary impact - communities and technology
If the content on Wikimedia projects is the most important factor in impact, then the ability to create and maintain that content is obviously a key contributing factor.
Volunteers create the Wikimedia projects - so anything that gets more people joining in on-wiki clearly tend to contribute to our impact. Similarly, getting a previous contributor to come back, or increasing editor retention, will tend to increase impact.
(I say "tend to contribute" because there have been occasions, now thankfully rare, where projects to get new contributors have resulted in large numbers of ill-trained editors making unhelpful contributions requiring extensive cleanup work which might not have been a net positive.)
Of course, not all contributors are equal: correcting one typo is not equal to writing a dozen featured articles. But assessing the impact of a contributor suffers from all the problems described above about assessing impact of content, plus the challenge that many contributors deal with vital maintenance, cleanup, review and dispute resolution as much as they generate content. The most commonly used measure of "contributor quality " is still edit count, despite the many well-known flaws of "editcountitis".
Naturally, steps that help contributors do a better job contribute to our impact. Again, "better" is something that is difficult to define, and probably even more difficult to measure (how can one unpick the benefit of a particular tool or project from the vast number of other factors, without a large-scale randomised controlled trial?). But one could expect e.g better access to sources to feature here, alongside any initiative that helps community health.
As an aside, two observations:
- Many projects have set goals related to recruiting new editors or retaining them. These have often proved challenging to deliver - in particular to keep people editing after an introductory event (e.g. editathon)
- "Things that make the community healthy and productive" may not be entirely the same as "things community members will bid for in grants or wishlists" - has this question ever been addressed?
Technology clearly plays a critical instrumental role in delivering content and enabling its creation. The WMF has tended to focus (correctly in my view) on reading and editing experience - although it's relatively recently that WMF has recognized that it's also important to work on technology aimed at the "power users" who make up the core of the editing community.
Important that technology developments are still impact-driven but this still faces challenges of measurement. For instance, Visual Editor should make a big difference to the experience, happiness and productivity of new contributors. Is there a framework to assess how good it is at doing this?
Equally, technology can readily have unintended consequences - for instance, it appears that automated counter-vandalism tools also resulted in a decline in participation from new users.
A growing part of the work of the Wikimedia movement (and a particular focus of movement affiliates) involves partnerships - in particular partnerships with cultural and educational institutions of one form or another. Many of these institutions have considerable amounts of content currently under lock-and-key, and significant expertise that could be used to improve the Wikimedia sites. Every time any of that material is added to the Wikimedia projects, our impact grows.
Naturally, measuring the impact of a partnership has all the same challenges as trying to measure the impact of content, with the additional problem that partnerships are relatively slow-moving and unpredictable - one can invest several years into developing a relationship and still find that a brilliant proposal is vetoed by a senior stakeholder at the end of it.
Public understanding & reputation
Improving public understanding of Wikipedia, and the reputation of the Wikimedia projects, also doubtless affects our impact. If no-one had any trust in Wikipedia, then no-one would believe the contents of our articles - so we wouldn't be fulfilling our mission. (And, naturally, no-one would give us any money and the site would go offline). However, we are very far away from that at present: we have an extremely well-known brand and vast traffic. How much do we care about having good stories in the media about what Wikipedia (or the WMF or Wikimedia affiliates) are doing, given that everyone uses Wikipedia all the time anyway?
We also care about the legal framework in which the Wikimedia projects operate. Most often, this is about copyright. If copyright is continually strengthened and extended it makes it far more difficult for us to share knowledge. By contrast if governments legislate for (or otherwise promote) open access to state-funded data, content and publications then that makes it far easier for us to share that particular sliver of the sum of human knowledge. The impact of this can be vast in the long run. Equally, we care about issues like access to connectivity, net neutrality, and in the ability of our volunteers to do their work without interference.
With public policy, however, there is always the question of how much this is our role specifically. There are several other organisations more specialised in dealing with open licensing or open journals, and many that work to protect free speech. Does this make a difference to what priority we give this kind of advocacy work?
- "A new metric for Wikimedia" , English Wikipedia Signpost article by Denny Vrandečić (User:Denny) from August 2014. Proposes an importance-reach view of impact.
- "The Keilana Effect", WMF blog post describing the use of ORES to monitor the impact of an initiative to improve articles on women scientists in terms of article quality.
- "Use and impact of cultural heritage images on Wikimedia Commons and Wikipedia" by James Morley, looking at methods of assessing the impact of cultural heritage media in a more sophisticated manner than just pageviews
- As it happens, I first started thinking about this issue in answering some questions when I was a candidate for the FDC
- The lack of discussion and measurement is probably because Wikipedias have developed as amazingly open volunteer projects, where people come along to contribute on whichever subject and in whichever way they feel most enthusiastic about. In the context of spontaneous self-organisation, the idea of prioritisation and resource allocation has often seemed at least irrelevant, probably harmful, and possibly obsolete. With an ambitious mission to share the whole sum of human knowledge and no identified goals along the way, anything that moved us toward that goal has looked valuable - possibly equally valuable. This is a big part of the reason that the Wikimedia movement hasn't had many big conversations about impact - we haven't had to.
- It's worth noting that by comparison to the rest of the Internet we scarcely care about reach at all. Most of the rest of the Internet is in the business of increasing pageviews and clicks at all costs - we're not.
- A good summary is at the English Wikipedia page on Systemic Bias
- Even though Wikipedia is highly collaborative, most Featured Articles are in fact driven forward by one individual editor - and editors appear reluctant to take on huge projects.
- Halfaker, Aaron; Geiger, R. Stuart; Morgan, Jonathan T.; Riedl, John (2012-12-28). "The Rise and Decline of an Open Collaboration System: How Wikipedia’s Reaction to Popularity Is Causing Its Decline" (PDF). American Behavioral Scientist. doi:10.1177/0002764212469365.