Grants:IEG/Learning from article revision histories
What is the problem you are trying to solve?
Wikipedia editors donate their time to improving articles. But what kinds of edits lead to the greatest improvements? Editors agree on the value of most edits, but occasionally there is disagreement, which often leads to reversions of edits. Too much reverting is discouraging to new editors and detrimental to productivity, but reversions are also critical to the continued and gradual improvement of articles (Zhang and Zhu 2006; Halfaker, Kittur, and Riedl 2011). In particular, reversions are a key part of the BOLD, revert, discuss cycle (BRDC) for collaboratively writing wiki pages. The purpose of this project is to assess the effectiveness of the BRDC strategy, and discover ways to improve it.
Assessing the effectiveness of the BRDC requires modeling Wikipedia as an evolutionary system. The theory of evolution is, at a general level of description, an algorithm or strategy for incremental change that can be applied as much to genetic evolution as to Wikipedia articles (Dennett 1995). The key requirement for evaluating these “evolutionary systems” is a reliable way of measuring success, that is, a way to compare different versions of an article or an organism. For biological organisms, success is measured as reproductive fitness. For Wikipedia articles, the measure of success is the quality of the article as defined by the Wikipedia community. The BRDC is a strategy for ensuring that the best edits always survive. The role of the BRDC in the evolution of Wikipedia articles is analogous to the role of natural selection in Darwinian evolution, but instead of being selected on the basis of reproductive fitness, the BRDC selects edits on the basis of improvements in quality. In evolutionary terms, the BRDC is how the Wikipedia community applies selection pressures for ever-increasing article quality.
Thinking about the editing process in evolutionary terms helps to see that there are many paths to great Wikipedia articles. BOLD editing encourages innovative editing in any form and from anyone. But it's the ability to revert in the BRDC that ensures that articles always improve. The efficiency of this process in evolutionary terms is the likelihood that surviving or extant articles have reliably improved at every generation of their reconstructed evolutionary history. Evolutionary histories must be reconstructed in order to turn lists of revisions into the tree structures more closely associated with evolutionary systems and their analysis. Once the evolutionary history of an article has been reconstructed, the effectiveness of the BRDC is the extent to which the series of edits excluding reversions leading from the start of the article to it's present form is what would be expected if objective measures of edit quality had been applied at every iteration of the BRDC.
Articles that have successfully applied the BRDC and have reliably improved over generations can be compared to those that have not in order to better understand the strategies of effective editing. For example, do efficiently evolving articles tend to have more editors involved, or is there a point at which more editors no longer improves editing efficiency? Other beneficial strategies to identify concern the practices of individual editors. Reversions are sometimes interpreted as devaluing the work of others, which can lead to conflict and inefficiency. Luckily reversions are not required in order to observe incremental improvements in article quality. Some editors always make good edits, and therefore reversions are not required for an efficient application of the BRDC. What is required is knowing whether an article's absence of reversions is a sign of effective writing or a lack of editorial oversight. What, if anything, do edits that reliably improve article quality have in common that those that might get reverted do not? For example, do edits of a particular size tend to improve article quality more reliably than others? An intuitive example is that making large edits to multiple sections of an article at the same time may not be the most effective way to engage in BRDC editing since it might be harder to reach consensus on large changes. But is that intuition true, and if so what is the best sized edit for reaching consensus? Can this information be used to promote the types of edits that are the most likely to reach consensus? A formal analysis of this problem in evolutionary terms is the calculation of the optimal "mutation rate" for articles.
Analyzing the efficiency of BRDC editing and identifying the characteristics of efficient article writing depend on objective measures of edit quality. Providing objective measures of edit quality at scale is the functions of the Objective Revision Evaluation Service (ORES). The ORES makes it possible to predict the quality of an edit automatically. The specific problem solved by this project is using the ORES to answer questions about the efficiency of the editing process, in particular the likelihood that articles always improve as a result of the BRDC. Given that the predictions of the ORES are likely to only improve as the size of the training data available through Wiki labels campaigns increases and as the algorithms themselves improve, the analyses of the efficiency of the BRDC is implemented as a pipeline that can be used just as easily now as in the future. But the key to using this pipeline to learn from article revision histories is to provide a way to see the results of efficient editing, which is why my proposed solution centers around visualizing the evolutionary history of Wikipedia articles.
What is your solution?
My solution is to build a research pipeline to facilitate learning from article revision histories. This pipeline is based around an interactive data visualization app called 'wikivision'. 'wikivision' is an app for viewing the evolutionary history of articles. Evolutionary histories are assembled from an article’s revision history to reveal the tree structure embedded within the linear sequence of edits to an article. Although the article contents are hidden in the directed graph, interactivity is provided to view the text version of the article at a particular node or the difference between any two versions.
'wikivision' is not the first attempt to visualize the tree structure of article revision histories. Ekstrand and Riedl (2009) created a system for visualizing tree structures in the margins of the traditional list view of article revisions. One of the challenges to doing this sort of analysis is determining how to draw the edges. The simplest method is to compare versions of the articles based on cryptographic hashes of the article text. Reversion are easy to spot by this method, but other sorts of edits, for example, partial reversions are harder to visualize. Ekstrand and Riedl (2009) attempted alternative methods of measuring graded inheritance in the tree structure, such as by comparing word counts, but these methods were more computationally demanding and in some cases resulted in unintuitive layouts. Drawing on these results, the default layout for evolutionary histories in 'wikivision' will be calculated from text hashes, but the more general point is that the same article revision history can be drawn in multiple ways, depending the the method used of determining inheritance.
The most interesting and novel layout of article evolutionary histories available in 'wikivision' is based on scores from ORES edit quality models. There are three edit quality models that 'wikivision' will feed diffs from article revisions and use the outputs to render alternative layouts. The average of the three model outputs can be combined to create a single “optimal” layout, which corresponds to the evolutionary history of the article if the highest quality version of the article were selected at each branch. The difference between this “optimal” layout and the actual revision history of the article is a measure of the effectiveness of BRDC editing.
'wikivision' supports learning from article histories to the extent that it provides new metrics for highlighting the successes of healthy articles and the areas to address in struggling articles. Once healthy articles can be identified, a number of basic questions about what they have in common that less efficient articles do not can be answered. For example, is there an optimal edit size reaching consensus in the BRDC? Do some size groups of editors work more efficiently than others?
An additional level of analysis is separating the types of articles that are more likely to evolve efficiently from those that struggle to improve in quality. The quality of article content is not proportional to reader demand (Warncke-Wang and Ranjan 2015), and the popularity of an article may strongly moderate the effectiveness of BRCD editing. 'wikivision' is designed in such a way to facilitate the incorporation of these alternative metrics for selecting articles to explore so that the evolutionary history of a popular, but efficient article can be compared to a popular, but inefficient one. The same filtering mechanism can be used to compare the evolutionary histories of Featured Articles to other article classes, and to find examples of efficient articles according to the predictions of the BRDC strategy.
In summary, my solution to the problem of learning from article revision histories is to implement an analysis pipeline as a web app that can be used to interact with individual article revision histories. The data that powers the visualization-the assembly of an article's revision history into a tree of edits-can be analyzed offline for millions of articles to assess the efficiency of the BRDC editing process. The results of this analysis can be fed back to the web app for highlighting the most efficient practices of Wikipedia editors directly in the revision histories.
Research the evolution of Wikipedia
Human cultural evolution has been likened to a ratchet to emphasize that improvements to cultural products accumulate, but the extent to which improvements in cultural products can actually be measured and predicted is an open research question. The difficulty lies in determining which parts of cultural change are random and which are sensitive to selection pressures (Mesoudi, Whiten, and Laland 2004; Bentley, Hahn, and Shennan 2004). Wikipedia is a unique case study for the study of cultural evolution because it is a context in which we should see clear evidence for selection pressures. These selection pressures are explicitly stated in the Wikipedia mission and community standards, and are carried out at the hands of the thousands of people who donate their time to making Wikipedia better. The question of whether Wikipedia articles evolve can be answered by demonstrating that Wikipedia editors apply consistent selection pressures across articles, even when the specific selection pressures are being carried out by independent groups of editors.
Demonstrate that articles always improve
Is there an upper limit to article quality? When Jimmy Wales was asked on his Talk page whether Wikipedia was getting better, his response was that the best test for whether Wikipedia is getting better is to compare the current version of an article to it's past versions.
My favorite way of checking this is to "click random article" on 10 articles, and go back and look at them a year ago, 5 years ago, 10 years ago. Every time I have tried, it's unambiguous: Wikipedia is getting better by this test.--Jimbo Wales (talk) 08:28, 7 September 2015 (UTC)
This example comes from Smallbones, who used it to kick off a series of analyses addressing this seemingly simple question. Smallbones found much evidence in favor of the "Wales test" for ever-increasing article quality, but much depends on how article quality is measured. Article quality as measured by article category (such as 'Featured Article') is arguably more a measure of completeness than it is a measure of editorial improvements to the text itself, the basic difference being that even complete articles can still be improved with continued editing.
This project utilizes measures of edit quality rather than article quality in the assessment of whether Wikipedia is getting better, and a goal of the project is to contribute these analyses to the ongoing discussion of whether Wikipedia articles can improve indefinitely. If true, these findings would further support the comparison of Wikipedia to an evolutionary system like biology, where reproductive fitness also does not have an upper limit. In fact, even after evolving E. coli for 50,000 generations in a constant lab environment, continued improvement in competitive fitness was still observed and projected to continue (Lenski et al. 2015).
Visualize article revision histories
Any visualization researcher faced with data as big as Wikipedia needs to accept that no visualization of Wikipedia is complete. Even visualizing a single article's revision history, which can be in the thousands of edits, is impossible to view all at once, making interactivity essential for exploratory analysis. A goal for this project is to combine searching and filtering with intuitive visual representation in order to understand how articles evolve. For example, users will be able to view the evolutionary history of an entire article or they can focus just on the edits that have been made to an individual section or paragraph. These filters dramatically reduce the number of nodes to show, and facilitate fine-grained analysis of editorial efficiency at the level of individual edits.
Analyze editing strategy effectiveness
The purpose of the data visualization is to depict article revision histories in a form that can be used to analyze the effectiveness of the BRDC strategy for collaborative editing. But of course the actual analyses can be performed offline and at scale for millions of articles. A goal for the project is to use to pipeline to analyze the efficiency of the BRDC across a wide range of articles.
- Prototype rendering article revision histories using D3.js.
- Solicit feedback from the Wikimedia research team on the proposed pipeline architecture.
- Verify the proposed analyses with ORES team.
- Set up continuous integration services for deploying, testing, and documentation.
- Build out the 'wikivision' dashboard to allow for viewing article text alongside the revision trees.
- Add in additional visual attributes, such as coloring nodes by editor and scaling edges by the size of the diff.
- Integrate the app with ORES to allow for visual representations of edit quality predictions.
- Solicit feedback from potential users.
- Perform ORES analyses of article revision histories at scale.
- Analyze the data with respect to the efficiency of the BRDC editing strategy and identifying the most effective editing techniques.
- Write up the results for distribution and presentation.
I calculated my salary during this project based on what I would be making as a Research Assistant on a similarly sized project at my University. The Annual Basis Full Time Rate for a Research Assistant position at the University of Wisconsin-Madison is $43,297.00. I'm budgeting a 50% appointment for the 6 month duration of the grant, resulting in a total requested salary of $21,650.00.
For computing resources, I'm requesting a budget of $1,000 during development and testing. I will communicate with the Wikimedia research team to get recommended workflows and cloud service providers and go with their recommendations.
- Salary: $21,650.00
- Computing: $1,000.00
Total requested: $22,650.00
- all project code will be open source, contributors welcome.
- promoted via the OSF
- The project is implemented as an analysis pipeline so that it can keep up with ongoing changes to articles or shut down and restarted at any time or as cost allows.
- The analyses rely on ORES, so as ORES improves, so will 'wikivision'.
- I've learned the lessons of doing things by hand or without safety nets the hard way, and have a sustainable workflow for web app and data science development including automated deployments and testing.
- I engage in open science and open source software practices in part because I realize that community support is key to sustainable software and research projects.
Measures of success
- Run scripts for automated deployment and testing of the 'wikivision' app (infrastructure as code).
- View the revision history of the first section of a randomly selected article.
- Compare an article's actual revision history to what would have been expected if objective measures of edit quality had been applied during each iteration of the BRDC.
- Make a graph of the efficiency of the BRDC strategy across all Featured Articles.
This project would really benefit from collaborators and feedback. Please see the talk page for specific questions and comments. If you are an editor interested in talking about the editing process, or you would be interested in testing the various portions of the web app as they are completed, please let me know.
I am a graduate student in the Psychology Department at the University of Wisconsin-Madison. I’m interested in studying Wikipedia as a "model organism" for the study of cultural evolution, and I hope to build research tools that both help the Wikipedia community and also advance the field of cultural evolution. I intend to disseminate and publish the outcomes of this research project, and to include it in my PhD dissertation.
I engage in open science practices, and have made all of my past experiments and data available for download from github and the Open Science Framework. I use open source software in my research.
A relevant project in my portfolio is the Telephone app, which was a web app I built to study language evolution, in particular the emergence of words from sounds.
Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for’’endorsing this project in the list below. (Other constructive feedback is welcome on the talk page of this proposal).
- Community member: add your name and rationale here.
- Pusle8 (talk) 16:45, 10 April 2016 (UTC) I endorse this project, accessing the view on revision histories and making researchable is a very topical problem for further developing and understanding Wikipedia. From the background it seems that the applicant is quite capable in getting the results and motivated to bring this through. The practical results are something for which I personally might be a consumer. In general development of tools to aid in studying Wikipedia is something that should be high up in the priorities of the community. Thanks to the applicant for proposing this good idea!
- This project would provide data and potentially a model to predict when an article has improved. The data could be use to understand the factors that affect quality judgements. The model could be use to automatically flag articles that have unexpectedly lower their quality. 18.104.22.168 20:48, 11 April 2016 (UTC)
- I think this proposal addresses an important problem in an interesting way that could become very useful to editors. While the focus is on the English Wikipedia, the approach would be interesting for other languages and other Wikimedia projects as well, some of which may even be easier to handle, e.g. Wikidata, Commons, Wikisource. The applicant is laudably familiar with open science workflows but will need input from people more familiar with Wikimedia workflows and communities. Judging from his reactions to feedback received so far, I am confident he will incorporate it constructively into the project. I am thus in favour of the project being supported. -- Daniel Mietchen (talk) 08:32, 12 April 2016 (UTC)
- very good proposal. Loretoff (talk) 08:37, 12 April 2016 (UTC)
- We know a lot about how Wikipedia maintains quality by removing undesired content, but much less about how it develops and retains good quality content. This proposal aims to bring a new perspective to the table and I would like to see what can come out of that. The author is responsive to community feedback, and as D. Mietchen points out familiar with open science workflows, suggesting that this will be a nicely transparent project. Regards, Nettrom (talk) 19:26, 12 April 2016 (UTC)
- Great idea -- very badly needed to have visualizations to help editors and resarchers understand article revision history.Jodi.a.schneider (talk) 21:53, 12 April 2016 (UTC)
- It's interesting, and I believe there is much valuable knowledge in article history. ShiyueZhang (talk) 02:07, 13 April 2016 (UTC)
- Good idea! Chelhel (talk) 11:48, 16 April 2016 (UTC)
- This project is great. The visualization idea is going to be very helpful for both community and wiki editor. Titipata (talk) 17:10, 17 April 2016 (UTC)
- This project would provide data and potentially a model to predict when an article has improved. The data could be used to understand the factors that affect quality judgements. The model could be used to automatically flag articles that have lower their quality unexpectedly. Interesting throughout. Dankodaniel (talk) 15:44, 18 April 2016 (UTC)
Dennett, Daniel C. 1995. “Darwin’s dangerous idea.” The Sciences 35 (3). Wiley Online Library: 34–40.
Ekstrand, M D, and J T Riedl. 2009. “rv you’re dumb: identifying discarded work in Wiki article history.” In Proceedings of the 5th International Symposium …. http://dl.acm.org/citation.cfm?id=1641317.
Halfaker, A, A Kittur, and J Riedl. 2011. “Don’t bite the newbies: how reverts affect the quantity and quality of Wikipedia work.” … Of the 7th International Symposium on Wikis …. http://dl.acm.org/citation.cfm?id=2038585.
Lenski, Richard E, Michael J Wiser, Noah Ribeck, Zachary D Blount, Joshua R Nahum, J Jeffrey Morris, Luis Zaman, et al. 2015. “Sustained fitness gains and variability in fitness trajectories in the long-term evolution experiment with Escherichia coli.” Proceedings. Biological Sciences / The Royal Society 282 (1821): 20152292–9. doi:10.1098/rspb.2015.2292.
Warncke-Wang, M, and V Ranjan. 2015. “Misalignment Between Supply and Demand of Quality Content in Peer Production Communities.” ICWSM 2015: Ninth …. http://www-users.cs.umn.edu/~morten/publications/icwsm2015-popularity-quality-misalignment.pdf.
Zhang, X, and F Zhu. 2006. “Intrinsic motivation of open content contributors: The case of Wikipedia.” Workshop on Information Systems and …. http://ebusiness.mit.edu/wise2006/papers/3A-1_wise2006.pdf.