Research:Study of performance perception on Wikimedia projects
The current metrics we use to measure page load performance are based on assumptions about what users prefer. Even the assumption that faster is always better is universal in the metrics used in this field, while academic research shows that this might not be a universal main criterion to assess the quality of experience (performance stability might be preferred) and is likely to depend on the subject and the context.
We usually deal with two classes of metrics, real user metrics (RUM) that we collect passively from users, leveraging the performance information that we can capture client-side. It's usually very low level, highly granular, and quite disconnected from the experience from the user's perspective. The other type of metric we use is synthetic, where we have automated tools simulate the user experience and measure things. These get closer to the user experience, by allowing us to measure visual characteristics of a page load. But both are far from capturing what the page feels like to users, because when the measurement is made, they don't require any human input. Even their modeling is often just the best guess by engineers, and only recently have studies looked at the correlation between those metrics and user sentiment. It wasn't part of the metrics' design.
In this study, we would like to bridge the gap between user sentiment about page load performance and the passive RUM performance metrics that are easy to collect unobtrusively.
In order to achieve this, we will run an on-wiki survey asking readers and editors about their perception of the page load's performance. We will then compare that data to the low-level RUM performance metrics.
Survey and data
The micro survey, based on QuickSurveys, is currently running on Catalan, French, and Russian Wikipedias, as well as English Wikivoyage on a small subset of article page views. The performance metrics are those already collected by NavigationTiming as part of normal Wikimedia performance RUM metric collection.
Main research questions
- How well correlated are the current RUM metrics we collect to users' perception of performance?
- Attempt to build models combining different metrics, using machine learning if appropriate.
- Review the best performing models and attempt to extract the underlying logic that makes them closer to human sentiment.
- Is performance perception different between wikis?
- Is performance perception different between article types (e.g., image-heavy)?
- Can we design new RUM metrics that outperform existing ones on user perception correlation?
- What new performance browser APIs could we propose to potentially improve the correlation?
By the end of the project, we aim to:
- gain a deeper understanding of the perception of page load performance;
- develop a predictive model that can output the performance perception of any given page load;
- design new real-user metrics collected client-side that better approximate user perception;
- propose new browser APIs to measure performance metrics that could have a better correlation to user perception, based on our findings about existing metrics.
- A large-scale study of Wikipedia's quality of experience (Dario Rossi, 2019), the published paper as a result this and subsequent research.
- T187299: The main task for this study (detailing the instrumentation and analysis).