Research talk:Module storage performance

From Meta, a Wikimedia project coordination wiki

Worklog: Saturday, Nov. 23rd[edit]

This morning, I'm getting the data together from the logging tables.

> select count(*) from ModuleStorage_6356853;
+----------+
| count(*) |
+----------+
|  1463364 |
+----------+
1 row in set (13.10 sec)

It looks like I have about 1.4 million events logged. That's just about perfect. First things first. I'd like to get the first page load timings from the set for each browser. I think that I can just look for the first timestamp corresponding to an "experimentId" in order to get a good observation.

CREATE TABLE halfak.ms_indexed_load
SELECT
    *,
    1 AS load_index
FROM (
    SELECT
        event_experimentId,
        timestamp
    FROM
        ModuleStorage_6356853
    GROUP BY 1,2;
) AS first
INNER JOIN ModuleStorage_6356853 loads ON
    first.event_experimentId = last.event_experimentId AND
    first.timestamp          = last.timestamp;
CREATE INDEX id_timestamp ON halfak.ms_indexed_load;

I can use the above query to get all of the first page loads per user, but that's kind of limited. I can run a similar query to get the second page loads. I think that, instead, I'm going to write a quick script to iterate over the set and add an index field.

EpochFail 16:47, 23 November 2013 (UTC)[reply]


That took longer than expected.

> select event_experimentId, timestamp, load_index from ms_indexed_load where event_experimentId = "159073b12fa422" limit 3;
+--------------------+----------------+------------+
| event_experimentId | timestamp      | load_index |
+--------------------+----------------+------------+
| 159073b12fa422     | 20131120074237 |          0 |
| 159073b12fa422     | 20131120074457 |          1 |
| 159073b12fa422     | 20131120074542 |          2 |
+--------------------+----------------+------------+
3 rows in set (0.04 sec)

Sadly, I have to go have the rest of my Saturday now, but I'll pick this up again tomorrow morning. The next thing I want to do is produce a set of load time distributions based on load index. I'd also like to plot load time depending on the number of modules that needed to be loaded (miss count).

--Halfak (WMF) (talk) 17:55, 23 November 2013 (UTC)[reply]

Worklog: Sunday, November 24th[edit]

Back to hacking this morning. The first thing that I wanted to look at was how many of each load index we have.

> select load_index, count(*) from ms_indexed_load where load_index < 10 group by 1;
+------------+----------+
| load_index | count(*) |
+------------+----------+
|          0 |   381860 |
|          1 |   194407 |
|          2 |   126509 |
|          3 |    91286 |
|          4 |    69973 |
|          5 |    55928 |
|          6 |    46087 |
|          7 |    38510 |
|          8 |    32982 |
|          9 |    28468 |
+------------+----------+
10 rows in set (3 min 16.76 sec)

We're getting *way* higher load indexes than I thought. I suspect that the vast majority of loads would be "0", but we're getting a bunch of 1 and 2 as well. It will be nice to have this in a graph.

OK, but first, I need a sample. How am I going to create one? I'll want to stratify on a couple of interesting dimensions:

  • event_experimentGroup
  • load_index

So, I think that what I want to do is sample 5000 observations from the Nth load of each experimental group. That means I'm taking 20 samples @ 5000 = 100,000 observations. Alright R. Don't blow out my memory now.


Alright folks, it's time to let some pretty GG plot shine through.

First up is a density plot. This plot shows how common certain load times are. Note that I've log scaled the X axis. These distributions look fairly log-normalish.

The density of load times is plotted by experimental group and load index.
Load time density. The density of load times is plotted by experimental group and load index.

Note the strong offset of the first load to the right. It looks like the first load takes about 4 seconds regardless of whether you have module storage enabled. All subsequent loads (1-9) are centered around 1-2 seconds.

Next I want to check to see if there are differences between the two conditions. Since the data looks log-normal, a geometric mean should be an apt statistic to pull.


The geometric mean load time is plotted by experimental group and load index.
Geometric mean load time. The geometric mean load time is plotted by experimental group and load index.

The test condition consistently wins out. Sadly, I have to cut myself off again this morning. I'll be back soon to write up these results into the main page. --EpochFail (talk) 17:29, 24 November 2013 (UTC)[reply]

Descending load times after the first load.[edit]

Descending page load times after index zero are weird. There are a few candidate explanations worth exploring here. One is that users who load more pages tend to have faster connections/computers. This can be checked by limiting the sample to readers who made at least 10 requests. This is my note to myself to look into this once I get a chance. --Halfak (WMF) (talk) 20:34, 25 November 2013 (UTC)[reply]

A possible another explanation I am thinking, fwiw: the higher the load index is (i.e. the longer the user is browsing), the higher the likelihood of receiving modules updates is. In other words a candidate explanation (between others as yours) is the effect of updating modules. In the number this should reflect an increased difference between control and test over time, but the actual numbers don’t clearly show this trend.
For the record, during this week, there were one MW deployment (1.23wmf4), five extensions deployments (BetaFeatures, MultimediaViewer, VectorBeta, CommonsMetadata, OAuth), one partial extension deployments (Echo on dewiki and itwiki), and two lighting deployments. It should be noted that the deployments during the experiment can influence all load index (depending if the user already visited the site before or after a given deployment), although it would problably influence more the load index 0, then 1, etc.
~ Seb35 [^_^] 09:24, 13 December 2013 (UTC)[reply]

Notes: Friday, Nov 29[edit]

Technical context[edit]

Relevant changesets[edit]

  • Id2835eca4: Enable module storage for 0.05% of visitors w/storage-capable browsers
  • If2ad2d80d: Cache ResourceLoader modules in localStorage
  • Ib9f4a5615: Controlled experiment to assess performance of module storage

Descending load times after the first load.[edit]

  • Predictive optimization: "Chrome learns the network topology as you use it...the predictor relies on historical browsing data, heuristics, and many other hints from the browser to anticipate the requests."
  • The higher the page view index, the more likely it is that there had been a previous page view in the same browser session, which means more page resources are available in RAM; a decreased likelihood of being affected by TCP slow-start; increased likelihood that a persistent connection had already been established prior to the current page view.
  • In short: not very mysterious.

--Ori.livneh (talk) 11:01, 29 November 2013 (UTC)[reply]

Worklog: Friday, November 29th[edit]

The first thing I'd like to do this morning is figure out if different browsers are affecting the declining load time after index 2.

The first thing I'll need to do is limit the user-agents by OS and browser. Ori did a bunch of work for me by creating a table that simplifies the work for me, but there are still too many different browser versions and operating systems. For now, I'd like to deal with as few levels as possible.

> select left(browser, 6), left(platform, 6), count(*) from ms_indexed_load_with_ua group by 1,2 having count(*) > 1000;
+------------------+-------------------+----------+
| left(browser, 6) | left(platform, 6) | count(*) |
+------------------+-------------------+----------+
| NULL             | NULL              |    22591 |
| Amigo            | Window            |     1769 |
| Androi           | Androi            |    78927 |
| BlackB           | BlackB            |     4349 |
| Chrome           | Androi            |    31276 |
| Chrome           | Chrome            |     3652 |
| Chrome           | iOS 6             |     1320 |
| Chrome           | iOS 7             |     7528 |
| Chrome           | Linux             |     5550 |
| Chrome           | OS X 1            |    48247 |
| Chrome           | Window            |   582435 |
| Chromi           | Linux             |     1953 |
| Firefo           | Linux             |     9763 |
| Firefo           | OS X 1            |    16562 |
| Firefo           | Window            |   217513 |
| IE 10            | Window            |   102248 |
| IE 8             | Window            |    62455 |
| IE 9             | Window            |    51368 |
| Maxtho           | Window            |     1061 |
| Mobile           | Androi            |     1407 |
| Mobile           | iOS               |     1508 |
| Mobile           | iOS 4             |     1939 |
| Mobile           | iOS 5             |     9467 |
| Mobile           | iOS 6             |    31410 |
| Mobile           | iOS 7             |    82537 |
| Opera            | Androi            |     1704 |
| Opera            | Window            |    28644 |
| Safari           | OS X 1            |    60039 |
| Safari           | Window            |     1532 |
| Silk 3           | Linux             |     1491 |
| Yandex           | Window            |     6715 |
+------------------+-------------------+----------+
31 rows in set (4.97 sec)

Time to generate some new tables and re-draw my sample... EpochFail (talk) 16:35, 29 November 2013 (UTC)[reply]


OK. So I wrote a quick script to simplify the to the most common browsers and most common operating systems. Here is count of browser loads:

> select simple_browser, count(*) from ms_load_simple_ua group by 1;
+-------------------+----------+
| simple_browser    | count(*) |
+-------------------+----------+
| NULL              |    31469 |
| Amigo             |     1769 |
| Android           |    78927 |
| Blackberry        |     4349 |
| Chrome            |   680431 |
| Firefox           |   244479 |
| Internet Explorer |   216537 |
| Maxthon           |     1083 |
| Mobile Safari     |   126949 |
| Opera             |    31302 |
| Safari            |    61734 |
| Silk              |     1829 |
| Yandex            |     6803 |
+-------------------+----------+
13 rows in set (2.46 sec)

And here it is by OS (platform):

> select simple_platform, count(*) from ms_load_simple_ua group by 1;
+-----------------+----------+
| simple_platform | count(*) |
+-----------------+----------+
| NULL            |    27801 |
| Android         |   114554 |
| Chrome OS       |     3652 |
| iOS             |   136358 |
| Linux           |    20416 |
| OS X            |   125618 |
| Windows         |  1059262 |
+-----------------+----------+
7 rows in set (1.86 sec)

Now to figure out how to visualize the distribution of timing by browser/OS. --EpochFail (talk) 16:57, 29 November 2013 (UTC)[reply]


Lots of dimensions here even after I reduced them a bit. Some browsers very clearly benefit less from caching than others.

The density of load times is plotted by browser, experimental condition, and load index (0,1).
Effect on load time by browser. The density of load times is plotted by browser, experimental condition, and load index (0,1).

--EpochFail (talk) 17:40, 29 November 2013 (UTC)[reply]


For a quick reference, here are the median load times for load_index 0 and 1 (median_0 and median_1) respectively for each combination with more than 100 observations.

    platform           browser   group median_0  n_0 median_1  n_1    diff
 1:  Android           Android control   5661.0  327   5333.0  315  -328.0
 2:  Android           Android    test   5809.0  342   5342.0  311  -467.0
 3:  Android            Chrome control   4657.0  125   2814.5  110 -1842.5
 4:  Android            Chrome    test   3673.0  139   2884.0  125  -789.0
 5:      iOS     Mobile Safari control  11668.5  600  10713.5  540  -955.0
 6:      iOS     Mobile Safari    test  11420.0  592  10695.0  559  -725.0
 7:     OS X            Chrome control   2139.0  139   1453.0  137  -686.0
 8:     OS X            Chrome    test   1884.0  111   1191.0  133  -693.0
 9:     OS X            Safari control   2273.0  216   1313.0  195  -960.0
10:     OS X            Safari    test   2228.5  196   1121.5  198 -1107.0
11:  Windows            Chrome control   3652.0 1728   1717.5 1916 -1934.5
12:  Windows            Chrome    test   3479.0 1768   1583.0 1880 -1896.0
13:  Windows           Firefox control   3887.5  698   1881.0  669 -2006.5
14:  Windows           Firefox    test   4196.0  666   1624.0  669 -2572.0
15:  Windows Internet Explorer control   3120.0  777   1500.0  743 -1620.0
16:  Windows Internet Explorer    test   3302.0  759   1234.0  743 -2068.0

Curiously, it looks like the initial drop in load time is smaller for the test condition in most cases, but both the first and second loads are often faster for the test condition overall. This is really curious since I can't imagine why the initial load would be faster in the case of ModuleStorage.

--EpochFail (talk) 18:31, 29 November 2013 (UTC)[reply]


Now I want to know if the declining trend after the first load can be attributed to two hypotheses

  1. the type of user who tends to load more pages tends to have a faster connection
  2. Chrome's intelligent pre-fetching is improving performance over a set of page loads

To check the first hypotheses, I plotted the geometric mean load times for the experimental groups only for those users who complete at least 10 page loads.

Geometric mean load times are plotted by the index of page load and experimental condition for users who loaded at least 10 pages.
Load time by bucket for >= 10 loads. Geometric mean load times are plotted by the index of page load and experimental condition for users who loaded at least 10 pages.

The declining trend is still visible, but it seems to have been mostly washed out by enforcing this limitation. Hypothesis 1 seems like a good candidate explanation.

Next, I split load times by whether the browser was chrome or not.

Geometric mean load times are plotted by the index of page load and whether the browser was Chrome or some other software.
Load time by chrome/other. Geometric mean load times are plotted by the index of page load and whether the browser was Chrome or some other software.

Chrome users tend to load faster (note that this may be due to chrome users having faster internet connections -- which seems like a likely explanation), but load times decrease over subsequent page loads for both chrome and non-chrome browsers.

--EpochFail (talk) 18:48, 29 November 2013 (UTC)[reply]

Saturday, November 30th[edit]

As I was thinking about this work last night, I realized that the dataset was small enough that, with a little pain, I could load the whole set of load timings into memory rather than sampling down. This allows some cool things.

The plot below breaks readers into quantiles (0%, 50%, 90%, 95% and 99%) by the number of loads that we have recorded for them and then plots the mean (geometric) load times at each index. This gives a clear view into the users who load more pages are faster phenomenon.

The geometric mean load time is plotted for each load index by the quantile for number of recorded loads that the reader falls into.
Load times by quantile. The geometric mean load time is plotted for each load index by the quantile for number of recorded loads that the reader falls into.

There's one more thing that was bugging me. The plot above that breaks out browsers by their load times kinda sucks because some browsers don't have many observations in the sample. Let's fix that.

Load time density is plotted by browser on non-mobile platforms.
Load time density by browser (non-mobile). Load time density is plotted by browser on non-mobile platforms.
Load time density is plotted by browser on mobile platforms.
Load time density by browser (mobile). Load time density is plotted by browser on mobile platforms.

There's one really cool thing that you can pick out from this plot that you can't really see in the others. It doesn't look like browsers on mobile devices end up getting any really performance boost from caching in either the control or the test conditions. Wow! --EpochFail (talk) 17:19, 30 November 2013 (UTC)[reply]


OK one more thing before I start writing up these results in a consumable format. I wanted to see if it looked like caching was working AT ALL for mobile, so I replicated the load index plot but divided the observations between mobile and non-mobile browsers.

Load time density is plotted by browser on mobile platforms for each experimental condition.
Load times by platform and condition. Load time density is plotted by browser on mobile platforms for each experimental condition.

OK. Time to start writing. --EpochFail (talk) 17:37, 30 November 2013 (UTC)[reply]

Impact of performance[edit]

Fantastic work, Aaron! This is really fascinating. I'm dumping a couple of links to blog posts summarizing findings from various companies about the impact of performance. The most robust study is from Aberdeen Group, who looked at over 160 organizations and determined that an extra one-second delay in page load times led to 7% loss in conversions and 11% fewer page views. link (paywall, yuck!). These two blog posts survey additional findings from Google and Bing:

--Ori.livneh (talk) 19:07, 30 November 2013 (UTC)[reply]

Differential updates[edit]

DynoSRC does something similar, but it decreases update size even further by diffing it against the old version... might be of interest. --Tgr (WMF) (talk) 21:49, 6 December 2013 (UTC)[reply]

Work log: Wednesday, February 15th[edit]

We're back after running a new experiment post bug fix. I have a week's worth of new data and it's time to see how different things look.

First of all, I took a sample of data last time. I'd like to not sample the data this time, so I think that I'm going to try to load all 3.6 million rows into R.

> SELECT count(*) FROM ModuleStorage_6978194 WHERE event_loadIndex <= 100;
+----------+
| count(*) |
+----------+
|  3688143 |
+----------+
1 row in set (1 min 29.97 sec)

--Halfak (WMF) (talk) 00:14, 16 January 2014 (UTC)[reply]


Still working to get the data in R. Things are shaped a little different than they were last time (due to some improvements in our logging), so I needed to work on getting the data aggregation worked out. One of the things that I need to do is extract OS and Browser from the User Agent. Luckily, python has a nice library for it. See https://pypi.python.org/pypi/user-agents/. --Halfak (WMF) (talk) 00:56, 16 January 2014 (UTC)[reply]


It's getting late, so I'm going to dump some plots and state my take-away. First, I didn't end up pulling in the whole 3.6 million observations. It turns out that the user agent info extractor is super slow, so I randomly sampled 10k at each load index between 1 and 19.

Mean load time is plotted by event load index and experimental condition for all readers.
Mean load time. Mean load time is plotted by event load index and experimental condition for all readers.
Mean load time is plotted by event load index and experimental condition for readers who browsed to at least 10 different pages.
Mean load time (10 loads +). Mean load time is plotted by event load index and experimental condition for readers who browsed to at least 10 different pages.
Mean load time is plotted by browser and event load index for each experimental condition.
Mean load time (Chrome). Mean load time is plotted by browser and event load index for each experimental condition.
Load time density is plotted by experimental condition and load index.
Load time density is plotted by experimental condition and load index.
Load time density is plotted by browser and event load index for each experimental condition.
Density of load times. Load time density is plotted by browser and event load index for each experimental condition.

It looks like Module Storage (experimental) is outperforming the control. I still need to do some work to split up mobile and desktop browser/environments, but the results look very clear. --Halfak (WMF) (talk) 02:54, 16 January 2014 (UTC)[reply]