Research:Reader retention data exploration
The Wikimedia Foundation (WMF) has been working on a new metric that can help us better serve the needs of our readers, complementing pageviews as our longstanding but limited core readership metric. One aspect of user behavior that we want to capture is retention (“are readers coming back, and how often?”). Because of our privacy requirements, in particular the decision not to set unique reader cookies, various standard options to measure readership are not available. The unique devices dataset that was conceived as a privacy-friendly alternative to the standard unique visitors metrics. WMF previously added a cookie to our webrequest data that records a user’s last access date. This also provides useful information about how long it takes users to return to the site. This project is about constructing a privacy-friendly retention metric without the use of any additional instrumentation.
Previous work during 2016/2017 constructed the necessary queries to extract and store the underlying last access data, and started to explore and vet possible metrics (including the average return time within 7 and 31 days and various percentiles). This project continues this work with additional exploration of how this data - now available for a timespan of over two years - might be susceptible to outliers and anomalies, how it differs across certain dimensions, and how it responds to seasonality and external impacts as well as internal site changes. The goal is to find a single metric that is both sensitive to changes in user behavior (such as those due to product changes) and robust against noise. For this exploratory work that had a focus on identifying potential anomalies in the underlying data, we looked at the average first return time as an easy to calculate metric, but other options like the geometric mean might ultimately turn out to be a more suitable choice for product purposes.
In contrast to pageviews and unique devices that tell us how many people visit the site (reach), the metric on user returns will help tell us how many people are coming back to the site and how frequently (retention). We want to be able to answer questions such as: Did introducing a new feature make it more or less likely for users to return? How do return rates compare between different countries and Wikimedia projects? Was a change due to active users or infrequent users?
We used extracted data from the webrequest datasets, which contain logs of all the hits to the WMF's web servers. Data extracted included the project (language), project_class (wikipedia, mediawiki, wikisource, etc.), access method (desktop, mobile web, or mobile app), country, user agent type (operating system, browser family etc.), view count and the last access date.
During this exploration and vetting we found that malformed data in the WMF-Last-Access field can lead to large distortions of the calculated average return time, so we added a step to filter these out.
We also reviewed only desktop and web-based data for this metric since retention on mobile apps are defined and tracked differently. Retention on mobile apps is different due to a higher barrier for entry; a user must successfully install the application on their first visit. Additionally, mobile apps come with a persistent device ID so it is possible to recognize a returning user.
We explored the data by reviewing the following:
- Time series of the average next return time (within 7 days and 31 days) for a variety of countries and projects, using all the available data (from December 2016 to now), stacked by the following dimensions:
- whether the counted (return) request was a main page view or not.
- Daily histograms of return time for several (e.g. +/-3) days around a date where the average next return metric shows spikes or other anomalies. The histograms show the percent returns that occur each day (between 1 to 31 days) following the last access date. We broke down each histogram by several dimensions (project, os, browser, and main page views).
Reviewed spikes and other anomalies identified in the time series plots of average user return time within 31 days to determine if they were generated by an external event (e.g. holidays) or internal site change.
Seasonality and other external influences
Many of the spikes in the 31-day average return time we identified occurred several days before a holiday. There are generally lower pageviews on holidays, which increases the average return time seen several days before.
- Across all Wikipedia projects, there are consistent spikes in the average user return time on desktop in late December from various countries showing the influence of this Christmas holiday on users’ return behavior. Out of the subset of countries reviewed, these spikes were seen in the United States, Germany, France, Japan, United Kingdom, and Spain. In these countries, the spike in average return time occurred on a last access date of either December 21st or December 22nd (three to four days prior to Christmas) in both 2016 and 2017.
- Below are the daily histograms of return time around the spike in average return time on December 21st on desktop from the United States and from France. The histograms show valleys or decreases in the distribution where Christmas and New Years occur. Other valleys in the distribution are due to a decrease in active users returning on the weekends on desktop.
- A look pageviews to English Wikipedia pageviews and German Wikipedia pageviews around December 2017 shows a drop in daily pageviews right after the last access date of December 21st through the Christmas holiday. This indicates that the spike in average user return time on December 21st is likely due to fewer readers returning between one to four days after they access the site.
- In both Indonesia and Bangladesh, there are spikes in average users return times on desktop several days in June before Eid al-Fitr. There is spike in average user return time in June 22,2017 on desktop in both countries prior to the start of Eid al-Fitr on June 25, 2017. In 2018, the spikes occur on June 8th in Indonesia and June 12th in Bangladesh prior to the start of Eid al-Fitr on June 14, 2018.
- In Bangladesh, there are also spikes in August which occur prior to Eid ul-Adha. These spikes are seen on both desktop and mobile web returns. In 2017, the spike occurs on August 28th, three days prior to the start of Eid ul-Adha on August 31st. In 2018, the spike on desktop returns occurs on August 16th, four days prior to the start of Eid ul-Adha on August 21st.
- In Spain, there is a spike in average user return time in late March and early April around Semana Santa. There is a spike in average user return time on April 06, 2017, three days prior to the start of Semana Santa on April 09, 2017. In 2018, the spike occurred on March 22 (Thursday), three days prior to the start of Semana Santa on March 25, 2018.
- In Japan, there were a number of spikes in the average user return, which occurred a few days prior to a holiday. There are consistent spikes in late April around Golden Week, a week from the 29th of April to early May containing a number of Japanese holidays. There is a spike in average user return time on April 28, 2017, one day prior to the start of Golden Week on April 29, 2017. In 2018, the spike occurs on April 27, two days prior to the start of Golden week on April 29, 2018. There was also a spike on August 10 in both 2018 and 2018, which is a day prior Mountain Day in Japan.
- Japan also has a December 22nd spike due to the Christmas holiday but also has a second spike on December 27th. This is likely due of the widely observed Japanese New Year. Most businesses in Japan shut down between January 1 to January 3.
Weekly seasonality on all Wikipedia Projects (mobile vs desktop)
We also reviewed weekly seasonality trends for the average return within 31 days metric and compared it with seasonal trends seen in pageviews.
- Desktop: Readers on desktop with a last access date of Thursday or Friday typically have the highest average return time within 31 days and readers with a last access date of Saturday have the lowest (i.e those who visit on Friday take longer on average to return and those who visit on Saturday are the quickest to return). This makes sense as there are fewer desktop pageviews on Wikipedia on the weekend and more during the week.
- Mobile Web: Readers on mobile web with a last access date of Sunday typically have the highest average return time and readers with a last access date of Friday have the lowest average return time within 31 days. This is also in line with the pageview trends typically seen on mobile web: higher mobile web pageviews during the weekend and fewer pageviews during the week).
Influence of internal site changes on user return time
We also investigated the impact of changes to the user experience on user return time, such as the release of the page previews feature on English Wikipedia and the shutdown of the Wikipedia Zero program. A decrease in active users or spike in new or infrequent users from an these internal site changes may result in an increase in the average return time a day or two prior to the event.
Page Previews Rollout
On April 17, 2018, the page previews feature was rolled out on English Wikipedia on desktop. A/B tests indicated that this feature led to an expected decrease in pageviews by 3 to 5% per session. This decrease in pageviews might lead to see a slight increase in average user return time if there are fewer active users returning. Despite the decrease the pageviews, there are no noticeable spikes in the average user returns within 31 days around that date
The daily return histograms for five days before and after this rollout also don’t show any significant changes in the daily frequency of returns. This indicates that while the page preview decreased view it did not significantly impact readers return behavior.
Wikipedia Zero Shutdown
On June 29, 2018 , the Wikipedia zero program (WP0) was deactivated in Angola leading to a decrease in the number of pageviews from about 20 million to 4 million per month. The time series chart below shows an increase in the average user returns within 31 days on mobile web from Angola starting a few days prior to the shutdown. This is likely due to a decrease in the number of active users returning to the site.
The increase occurs on June 26, 2018, a few days prior the day WP0 was shutdown. It reaches a peak average of 7.5 days on June 28, 2018, one day prior to the WP0 shutdown. Unlike the the spike seen around holidays, the average returns do not return to their previous levels indicating a sustained impact on user return behavior.
The histograms of daily returns 5 days before and after the WP0 show a drop in users returns on the date of the shutdown. Prior to the shutdown on June 28the, the percentage of users returning in one day ranges from 30% to 40%. After the shutdown, only between 20% to 30% return in one day. We also broke down the average user returns within 31 days by browser in Angola to determine any impacts of a bug fix or change affecting that browser. There were no anomalies or significant changes in an individual browser family indicating that the changes in average return were not due to an artifact.
Singapore Data Center
Around the end of March 2018, Wikimedia traffic started being routed through a new data center in Singapore. The switchover resulted in improvements to page load times in the impacted regions and we wanted to determine if that also impacted the behavior of readers. We investigated the user’s average return time in the impacted countries to identify any changes to the users return behavior resulting from the new data center (T184677). During this analysis, we did not identify any significant increases or decreases in the average user returns right around the dates of the switch to Singapore data center. However, we did identify spikes in average user return time on other last access dates in Indonesia and Bangladesh. We further investigated these spikes to learn more about the behavior of the average return metric and determine any potential events that caused the average to go up (T200111).
As mentioned in the previous section, the spikes in average users return times in Indonesia on Wikipedia occur several days in June before a holiday (Eid al-Fitr). Daily histograms of return time for several days around these dates showed that they were not concentrated on any specific dimension (individual project, os family, or browser family) indicating that the increases in average return time are a result of real user behavior and not a data artifact.
In Bangladesh, there is a significant spike in average user returns that occurs on January 27, 2017 (13 days) on desktop. This spike does not appear to be associated with a holiday and breakdowns of the daily return histograms by various dimensions were not concentrated on any individual project, os family, or browser family. Further investigation is needed to determine the increase in average returns that occured around this date.
Other mobile web and desktop spikes in Indonesia occur in June and August. These are related to holidays Eid al-Fitr in June and Eid ul-Adha in August as described in the earlier section.
Differences between projects and platforms
Small versus large-sized Wikipedia projects
We reviewed a subset of both small (Cocos (Keeling) Islands, Falkland Islands (Malvinas), Trinidad and Tobago, Malata) and large-sized (United States, Spain, Germany, Japan, United Kingdom, and France) Wikipedia projects based on size. As seen in the time series chart below for Trinidad and Tobago, the small Wikipedia projects are “noisier” compared to large Wikipedia projects, i.e. show a larger daily fluctuations in average user return time making it difficult to identify any trends in time series charts. This is analogous to the limitation identified for the unique devices dataset (“Domains with less than 1000 uniques devices daily are too “noisy” and show too much random variation for data to be actionable”). These smaller wikis may be too small with too little activity for the metrics to be reliable.
Desktop versus mobile web return time
The majority of spikes in average user return time within 31 days occur on desktop. In general, average returns on mobile web have fewer anomalies. We identified only a couple instances of spikes or other anomalies for mobile web such as on on mediawiki and Tajikistan out of the sample of countries and projects reviewed.
The average users returns within 31 days on mobile web and desktop were similar in most countries reviewed with a few exceptions: Mobile web return times were consistently higher than desktop in India, Taiwan, and on Wikidata and Wikinews. Mobile web return times were consistently lower than desktop on Wikivoyage, Wikisource, and in Japan.
We also compared the daily histograms of return time on mobile web and desktop over the same set of last access dates for Bangladesh and Indonesia. In Bangladesh, the daily histograms of return time on mobile web showed a more right-skewed distribution with a higher percentage of readers on mobile web returning 1 or 2 days after their last access date compared to the distribution of return time on desktop.
- Adapt the dataset underlying the investigation above (average return times by project or by country) for ingestion into Druid? (as a test/demo dataset, to allow explorations like the above in Superset/Turnilo)
- Decide on the best choice of metric to be used going forward. Apart from the mean return time, which is easy to calculate and was thus used for most of this project, other options include the geometric mean, the median, or other percentiles, which might have the advantage of being more robust against outliers.
- To do this, we plan to mimick the approach used in the recent Reading time project (which focused on another new reader metric, measuring engagement instead of retention) and fit a small set of candidate distributions to the daily histograms of return time.
- If we find that an existing parametric distribution (e.g. log-normal) is a good fit, that would also yield recommendations on what kind of statistical test to use in future A/B tests for product purposes (like in the outcomes of the Reading time research mentioned above).
- Revisit the idea of maintaining a separate, “faster” version of this retention metric based on 7 days instead of 31 days - while the 31-day average return metric provides more information, the 7-day version may be valuable in identifying changes in user return time more quickly (We already did some exploration of this in 2016/17).
- Productionize the last-access query into a regular Oozie job (currently it is still being run manually once a month by Tilman), and make the chosen metric accessible in Superset/Turnilo.