Research:Wiki Ed student editor apparent quality contribution

From Meta, a Wikimedia project coordination wiki
Duration:  2016-05 – 2016-07
This page documents a completed research project.


We use ores to evaluate the apparent quality of Wiki Ed students' pages before they start and once they finish. The beginning and ending quality for a quality transition and we examine how students compare to general editors. We find that Wiki Ed students contribute 2% of the low-to-high quality leaps (moving from Empty, Stub, or Start to B, GA, or FA) that occurred during the spring semester.


  1. Where do the articles our students choose to work on typically located on the quality scale?
  2. Where do those articles end up once our students have finished with them?


Quality rating[edit]

The quality rating used here will differ in two minor ways. First the A class rating is done away with, instead being merged into GA. This is based on Aaron Halfaker's work with ores indicating that the two classes are nearly indistinguishable from each other. The second difference is to create an additonal Empty class as a new lowest quality rating for those articles that are not yet created. This gives us the final rating scale, in order of descending quality:

  1. FA
  2. GA
  3. B
  4. C
  5. Start
  6. Stub
  7. Empty


ORES is the Objective Revision Evaluation Service put out by Wikimedia. It uses a machine learning algorithm to approximate quality ratings for any revision. The version used for this analysis is available at{0}


First we gather student revisions from the spring semester. This is done by identifying all pages that students works on, then finding the first student revision and last student revision of that page that was not deleted. The last student revision serves as the right endpoint of our interval while the parent revision of the first student revision serves as our left endpoint. Additionally we record the number of student edits made between these two revisions as well as the total number of edits made. We then filter out any pages that were not at least 50% student edits.

To gather a comparable data set for the general population we take a sample of pages that had been edited sometime during the spring semester. First we identify all pages edited, then we select for pages in the main namespace such that page_id % 10 == 0 . This sampling was necessary because queries selecting all pages in the main namespace were timing out before completing. We selected the revision just before the start of the semester (January 1) and just before the end of the semester (June 1) as our left and right endpoints.

This sampling of pages edits provided us with a set of 353,269 pages, leading us to assume that during the course of the semester 3,532,690 mainspace pages were edited. It would not be feasible to rate all of these 353269*2=706538 revisions, so we sample down to a set of 60,000 pages giving us a total of 120,000 revisions.

We then began rating both student and general editor revisions. The revisions were rated by ORES in batches of 50 to approximate their quality. Revisions that were not found were given the quality rating of Empty, this was commonly the case with revision id 0 which is the parent of the first revision of a page. During rating if a revision was found to be deleted, which could occur at the left endpoint, it was recorded as deleted to determine if deletion would be a significant issue for the analysis. Rating was completed in about 3 hours.


General Editors[edit]

There are a couple strong features that stick out with general editors. First, there is a very strong diagonal. This indicates that pages tend to stay the same quality that they already are. This is not incredibly surprising since we only require articles to be edited a single time to be considered, so we would expect these articles to stay at about the same position as they were in before.

Additionally we see a trend up and to the left which corresponds to greater activity in lower quality areas. Again, this makes sense since we are considering matters at the page level and there are many more stubs than featured articles.

Finally we see much more activity in the upper right than the lower left. This indicates a general trend towards increasing quality. On top of this we see that most of the quality increase happens at lower levels.

Student Editors[edit]

We see several shared features between students and general editors. They have a similar strong diagonal, but it differs in that there does not appear to be a clear gradient along it, although there does seem to be one above it. This difference is currently unexplained.

We see the same trend towards quality improvement, although it appears much stronger with the upper right being significantly darker than the lower left. This makes intuitive sense since our students tend to focus on a specific page, rather than going around making many small edits to several pages.

Student Proportion[edit]

Using these two data sets we can approximate the portion of student edits going on in specific quality areas. We approximated that there were 3,532,690 articles that received edits during the spring semester. Of these articles we examined 60,000, so when looking at proportions we need to scale up our general editor quality transition counts by about 59. after making this transformation to the general editor data set we divide the number of student pages making the given transition by the scaled up number of general editor pages making the same transition.

Here we can see a clear strong area for Wiki Ed, making transitions from exceptionally low quality articles to high ones. To be clear, transitions from low to high are not the most common thing we do, but they are one of the weird things we do. Articles are not very likely to make leaps and bounds in quality, but they are a lot more likely to do it while we are working on them. This is what allows us to command such a high percentage of these transitions. Our best transition (Stub to GA) sits at 4.7% while a more general low to high figure (Empty, Stub, Start to B, GA, FA) sits at 2%.


There are at least two noteworthy limitations to this work. First ORES is a good system, but it is not a perfect system. It cannot take into consideration the complexity of a specific topic or quality of sources in the same way that a human reviewer can. It provides reasonable approximations for revision quality for the general case, but may be wrong when certain topics have a unique standard for quality. Additionally ORES has not been trained specifically against student work which has the possibility of impacting the quality of the predictions.

Second, only 2/3 of student pages are pure student edits, while those that are not pure also tend to be those that received a greater number of student edits. In order to approach this work it was necessary to include some contributions that were not made by Wiki Ed students in the evaluation of quality.