I did some research on logging data concerning the nature of the page revision comparisons. Logged was how many revisions exist of the page concerned and the positions of both comparisons. Based on that, we could look how far back people go and over how many revisions we compare. We used this information for the design of the RevisionSlider Widget.
- There are only 10% comparisons which have intermediate versions (so a revision is directly compared with a preceding revision)
- There are less than 50% comparisons which do not involve the most current revision.
- The most frequent comparisons can be done well via the revision slider interface
- There is a difference in use between revision slider enabled or not, but the reasons for it need more investigation.
Data description and processing
- Timeframe: 2016-07-27 – 2016-07-28
- Wiki: dewiki
- See 'https://github.com/addshore/dewiki_diffstats' for the script which generated the data.
- The data was analyzed using R and visualized using Vegalite and ggplot. The data is descriptive – there was no experimental manipulation.
- The following data was obtained for each comparison of revisions
- Intermediate revisions: The number of revisions between the two compared versions
- Newer revisions: The number of revisions between the more current revision of the compared revisions to the most current revision of the article.
- A true/false variable stating if the revision slider was enabled.
RESEARCH QUESTION: How many intermediate versions are compared?
Here is a histogram of how many data points lie in which range
OK, they seem to be rather extreme: there is a wide range, but almost all are low numbers. That makes visualization hard.
Let’s try a non-visual overview in quantiles:
R > quantile(diffData$intermediate,seq(0,1,by=0.1))
What it does is sorting all values like on a long line ranging from the lowest value at 0% the highest at 100% and then showing values at certain positions on this line. You can also think of it as “n% of the values are equal or smaller as m”, e.g. for the table below from 90-100% you could say “97% of all values are smaller or equal than 4”
ok, so that seems rather extreme, in fact, if we use percentiles (meaning we split our sorted data in 100 sections) and only look at the highest 10% (from 90%-100%) we get:
So, 10% of the comparisons actually compare non-adjacent versions
Can this range be accessed via the revision slider?
Yes. The range between 90% and 99% is no problem for the interface, the range slightly above 99% is also still ok, a bit more than 1% of the comparisons won’t be easy to do with the revisions slider.
RESEARCH QUESTION How far back do do comparisons go in time?
That looks like a familiar pattern – the values span a really wide range, but almost all are rather low.
This is not as extreme as the value distribution of the intermediate values. Less than 50% of all comparisons do involve the most current revision.
Can the range be accessed via the revision slider?
Yes. The range from the 1st to the 80th percentile is no problem for the revision slider interface (80% of the measurements go back 8 or less revisions), the range above 90 is difficult (but also not very easy to do in the conventional interface with the radio buttons)
RESEARCH QUESTION: Is there a correlation between going back and comparing over intermediate versions?
Subsetting the data to get all versions without both 0 newer revisions and 0 intermediate and creating a scatter plot with log-log scales (all values shifted up by +1, since log scales can’t go below):
It does not look very correlating. Brief test with Kendall’s rank correlation:
τ = -0.1822176 (1 is a perfect positive correlation, 0 is no correlation at all)
Nope, not correlating.
Are there differences between comparisons with and without the revision slider?
Significance of the difference when comparing revision slider enabled and revision slider disabled for intermediate revisions via Mann-Whitney-U-Test:
p-value = 2.353e-16
However, parts of the sample are interdependent which violates the independence of observations assumption of the Mann-W.-U test. To meet this requirement, it would require a more elaborate strategy (if there are any suggestions you actually applied in another test already, please tell me)
Quantiles for the Newer Revisions-Measurements
With Revision slider
Without revision slider
Instead of comparing quantiles, we can also try to quantify how much we can trust the difference (or rather, the non-difference of the null-hypothesis):
So we test the significance when comparing revision slider enabled and revision slider disabled sample of the newer revisions-variable via Mann-Whitney-U-Test
p-value = 2.274e-10
Here, too: Careful, test assumption violation because of observation interdependence...
Why do we have 10% comparisons with intermediate revisions, but about 50% comparisons going back in time?
The “old”/standard watchlist has a link "difference"
It leads to comparing the concerning version with its direct precursor.
The collapsible/extended watchlist has two links, "current" and "previous" ↓
When you click on “current”, it compares the concerning version with the most current one. When you click "previous" it compares the concerning version with its direct precursor, like the standard watchlist’s "difference".
So the hypothesis would be that the links "previous" and "difference" are used often. They do go back in time (higher newerrev) but don’t span intermediate revisions (no intermediate) so their use would result in the observed pattern.
…But all this is just hypothetical and could be tested with click path analysis.
- This document was generated using knitr. It took the roundtrip from knitr to google docs (internal sharing, commenting) and from there to .docx to wikitext (since converting docx via pandoc maintains the tables) to share it here.
- I learned:
- a literate programming framework like knitr is really useful
- conversations about your report are helpful to write it in a way that is easier to understand
- Percentiles are not very intuitive for others.
- Keep track of which data came from where.
- What I would like to do next time:
- Try to use dplyr and ggplot2, since they allow well readable code.