The Wikimedia Foundation’s web team introduced a new instrumentation in 2017 to measure the time Wikipedia readers spend on a particular page (https://meta.wikimedia.org/wiki/Schema:ReadingDepth ). It works by sending an event on page unload, with timer values that also account for how long the browser tab was visible during the time it was open.
In the beginning, we wanted this as an additional metric that would help us determine user behavior after launching features (such as this year’s launch of the Page Previews). However, we realized that this data could also give us fascinating insights around reading behavior in general.
So far, we completed some exploration and vetting of the new metric in 2017 with an intern (working with Tilman), and some checks should still be done (https://phabricator.wikimedia.org/T160492 ). The metric is also going to be used by the web team in an upcoming A/B test for a design change on the mobile website (Page Issues). That said, we have not yet had the opportunity to explore what this new data can reveal about general reading patterns:
Research and analysis questions and ideas
- What are the general patterns of time spent? Calculate averages and various percentiles, characterize the form of the distribution / histogram (cf. https://www.nngroup.com/articles/how-long-do-users-stay-on-web-pages/ )
- What distribution best models dwell times: Gaussian? Weibull? This might be different for different types of articles.
Explore how the metric differs:
- Are there projects and languages where users tend to read longer? How does the metric differ by project?
- How does reading time differ within a browser session? Does the last pageview tend to be the longest?
- Does reading time differ over time? Are there patterns of weekly seasonality? Does reading time increase on holidays?
- Do longer pages have longer reading time? How does the metric differ by length of the page viewed (in bytes/characters/number of sections). How does actual reading time compare to the estimated time needed to consume the page’s entire text (cf. https://help.medium.com/hc/en-us/articles/214991667-Read-time )?
More effort regarding data preparation:
- Are there subject areas more prone to quick-fact lookup and others more conducive to learning entire articles? (could potentially use the Scoring team’s draft topic library as in https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Citation_Usage/First_Round_of_Analysis#Dimensions_of_Analysis )
- How does the metric differ by country/geolocation? Are there countries where people tend to read longer? Do people in cities read less than people in rural areas?
More complicated regarding data vetting:
- Desktop vs. mobile web. We take it for granted that people read longer on desktop. Is this true for all projects? This is particularly fascinating when we consider that mobile is now the dominant form of internet consumptions and, for many users without access to desktop devices, the only form of internet consumption.
Do a literature review on existing research about web-based time spent / dwell time metrics, compare with our data. Publications worth looking at might include:
- The Nielsen page linked above (about the Weibull distribution) and the Microsoft paper it refers to. The Microsoft paper fits models to Weibull distributions and predicts Weibull distribution parameters from web page properties (but their approach seems not so satisfactory).
- “Beyond clicks: dwell time for personalization” RecSys '14 Proceedings of the 8th ACM Conference on Recommender systems, Pages 113-120, Foster City, Silicon Valley, California, USA — October 06 - 10, 2014 doi>10.1145/2645710.2645724 [by Yahoo researchers]
- Compare with data from other websites (e.g. https://en.wikipedia.org/wiki/Medium_(website)#Background )
- Compare with data we have from the Android Wikipedia app (“time_spent” field in https://meta.wikimedia.org/wiki/Schema:MobileWikiAppPageScroll )
Explore whether there are any results of the “Why we read Wikipedia” reader surveys that can be connected to this new data
Contribute to the development of a workflow or an automated tool to regularly inform the Readers team and the Wikimedia movement on how this metric is developing - this might simply involve a dashboard (with support of our analytics engineers and Tilman) or other things.
- Research:Which parts of an article do readers read