Research talk:Reading time/Work log/2018-10-26
Friday, October 26, 2018
Investigating Timing Discrepancies
I previously observed a low, but significant rate of inconsistent timings in the server log. Jon asked If I would look into this a little more and provide some data on events where this occurs and surrounding events.
I plotting the distribution of these events over time reveals two interesting patterns:
- A very brief spike around [Date]
- A longer lived, normally distributed spike around [Time]
I created a table in Hadoop (nathante.neg_span_spike) to enable easier investigation of the spike.
I'll upload plots of the distributions of the incidence of the issue over time as well as distributions of the discrepancies.
One pretty crazy thing I think I might be seeing is that the page unloaded events are in December 2017, but the month and date fields are set to August and September.
Code for building the page timings table
---- the lengths in the record are in ms, but the dt are in s CREATE TABLE nathante.pageEventTimings LOCATION "/user/nathante/pageEventTimings" AS SELECT lo.pageToken AS pageToken, lo.dt_ts AS lo_dt_ts, ul.dt_ts AS ul_dt_ts, ul.dt_ts - lo.dt_ts AS EventSpan, (ul.dt - lo.dt) - totalLength/1000.0 AS errTotalLength, (ul.dt - lo.dt) - visibleLength/1000.0 AS errVisibleLength, ul.dt as ul_dt, lo.dt as lo_dt, ul.totalLength, ul.visibleLength, ul.domInteractiveTime, ul.event, ul.useragent, ul.wiki, ul.year, ul.month, ul.day, ul.hour, ul.webhost, ul.ip, split(ul.ip,'[\.]') AS ipBlock3, substr(ul.ip,0,1) AS ipBlock1, ul.ip LIKE "%.%.%.%" AS isipv4, ul.geocoded_data FROM (SELECT unix_timestamp(GetMediawikiTimestamp(dt),"YYYYMMddHHmmss") AS dt_ts, dt, event.pageToken as pageToken FROM nathante.cleanReadingData WHERE event.action = "pageLoaded") lo, (SELECT unix_timestamp(GetMediawikiTimestamp(dt),"YYYYMMddHHmmss") AS dt_ts, dt, event.pageToken as pageToken, event.totalLength AS totalLength, event.visibleLength AS visibleLength, event.domInteractiveTime AS domInteractiveTime, event, useragent, wiki, year, month, day, hour, webhost, ip, geocoded_data FROM nathante.cleanReadingData WHERE event.action = "pageUnloaded") ul WHERE lo.pageToken = ul.pageToken AND lo.dt_ts > 0