Research:Module storage performance
In this study, we assess the impact of caching ResourceLoader modules in localStorage on page load time.
In this study, we'll experimentally explore the effects that module storage has on performance in practice on Wikimedia wikis.
In order to explore the effects of module storage, we ran a controlled experiment where we randomly sampled 0.1% of browsers and randomly split them between "control" and "test" conditions.
- Module storage was disabled and users' browsers were expected to perform all caching
TODO: Ori discusses bucketing algorithm in details
Between 07:42 UTC on Nov. 20th, 2013 and 23:17 on Nov. 23th, 2013 we gathered 1.49 million load timings for 381,860 unique readers using Schema:ModuleStorage.
In order to compare the performance of module storage against browser based caches, we sought to measure both the pre-cache performance (the first time that a reader loads the site) and post-cache performance in both experimental conditions. We assume that the first recorded page load during the experiment is related to pre-cache performance and that, from the second load onward, we are observing post-cache performance. However we didn't want to just stop with the second page loads, there are some reasons to believe that performance might continue to improve after the second page load, so we indexed and compared all subsequent page loads as well.
Load time statistic
In order to measure the differences in load time, we needed a statistic that represents a stable measurement of the distribution of load timings. To figure out an appropriate statistic, we plotted the density of load timing split by the type of load. Upon logging the x axis, figure #Load time density shows two clear, overlapping log-normal distributions for the first pre-cache page load (index=0) and the post-cache page loads (index=1-9).
This log-normal distribution of load timings suggests that a geometric mean would provide solid, stable description of the distribution.
Grouped analysis: which one is faster?
Module storage is faster.
Why the descending load timings?
Readers who load slower tend to browse less.
Differences between browsers
Mobile doesn't benefit from caching as much or as consistently as non-mobile.
- Predictive optimization: "Chrome learns the network topology as you use it...the predictor relies on historical browsing data, heuristics, and many other hints from the browser to anticipate the requests."
- The higher the page view index, the more likely it is that there had been a previous page view in the same browser session, which means more page resources are available in RAM; a decreased likelihood of being affected by TCP slow-start; increased likelihood that a persistent connection had already been established prior to the current page view.
- Deployment announcement: http://www.gossamer-threads.com/lists/wiki/wikitech/403262#403262