User:MPopov (WMF)/Notes/Android app analytics

From Meta, a Wikimedia project coordination wiki
Screenshot of the Wikipedia Android app showing the setting where the user can opt out of analytics.

(These notes are an ongoing work-in-progress.)

Metrics[edit]

Google Play Store reports[edit]

The following monthly reports are available from the Play Console:

  • Acquisition channels in acquisition reports include Play Store (users find the app by browsing or searching on the Play Store app), Google Search, third-party referrers (users find the app via an untagged deep link to the Play Store), and AdWords (Google’s advertising service).
  • App ratings over time are calculated from users’ 1-5 star ratings.

App stickiness (DAU/MAU)[edit]

We currently have an overall (global) daily & monthly active users (DAU/MAU). T186828 is about calculating those at a per-country level. Since these queries rely on the wmfuuid field (which contains the appInstallId) in the X-Analytics column of webrequests, the raw DAU and MAU counts are lower than what they should be because web requests from the app will not contain that info if the user has turned off "Send usage reports" in the app settings.[1]

Build Variants[edit]

There are four main build variants with the following differences:

  • Dev: sends events to the beta EL cluster (deployment-eventlog05.eqiad.wmflabs) and sampling is disabled, same as Alpha. Developer Settings in the UI are enabled by default.
  • Alpha: sends events to the beta EL cluster (deployment-eventlog05.eqiad.wmflabs) and sampling is disabled, same as Dev.
  • Beta: sends events to production EL and sampling is enabled, same as Prod.
  • Prod: sends events to production EL and sampling is enabled, same as Beta.

Developer Settings[edit]

This button is visible by default in the Dev build and hidden otherwise but can be revealed by going to the About screen and tapping the circular W icon 7 times. The app install ID can then be found under readingAppInstallID.

EventLogging[edit]

Sampling Rates[edit]

Per T187239#4025260, when a user has opted-in (or not opted-out, as the case may be) to sending us usage reports, the amount of data we receive from those users varies by funnel (each feature or metric has its own funnel). Different analytics funnels have different activation rates. If N is the number of users who are opted-in, then if a funnel is configured to have:

  • SAMPLE_LOG_ALL (default), that funnel activates for 100% of those N users
  • SAMPLE_LOG_10, that funnel activates for ~10% of those N users
  • SAMPLE_LOG_100, that funnel activates for ~1% of those N users
  • SAMPLE_LOG_1K, that funnel activates for ~0.1% of those N users

Also, because of how modulo works, if a user's appInstallId (unique, randomly generated on first launch) activates the funnels with SAMPLE_LOG_100 rate, then all the funnels with SAMPLE_LOG_10 rate also get activated.

Filtering[edit]

Older versions of the app (2.4.184 and below) had a bug wherein instead of the wiki ID such as "enwiki", the app would send the version (see T188557 for more details). Therefore, if calculating statistics on usage of features by wiki, some conditions are necessary:

SELECT *
FROM log.MobileWikiAppLinkPreview_15730939
WHERE timestamp >= '20180201' AND timestamp < '20180301'
  AND RIGHT(LEFT(userAgent, 28), 7) > '2.4.184' -- non-bugged version
  AND INSTR(userAgent, '-r-') > 0 -- release version
LIMIT 10;

Here's the equivalent HiveQL version if working with the event logs in Hadoop:

SELECT *
FROM event.mobilewikiapplinkpreview
WHERE revision = 15730939
  AND year = 2018
  AND month = 2
  AND useragent.wmf_app_version > '2.4.184' -- non-bugged version
  AND INSTR(useragent.wmf_app_version, '-r-') > 0 -- release version
LIMIT 10;

Debugging[edit]

Setting build variant to devDebug will:

For example:

02-26 11:16:04.838 10715-10715/org.wikipedia.dev D/org.wikipedia.analytics.Funnel: log():137: SearchFunnel: Sending event, event_action = start
02-26 11:16:04.843 10715-10787/org.wikipedia.dev D/OkHttp: --> POST https://deployment.wikimedia.beta.wmflabs.org/beacon/event?%7B%22schema%22%3A%22MobileWikiAppSearch%22%2C%22revision%22%3A15729321%2C%22wiki%22%3A%22enwiki%22%2C%22event%22%3A%7B%22action%22%3A%22start%22%2C%22source%22%3A0%2C%22appInstallID%22%3A%226449295c-34c3-4a9f-8a8d-4750479bf808%22%2C%22searchSessionToken%22%3A%221c74d24d-1684-4ce5-bcdb-183f07b6357b%22%7D%7D (0-byte body)

so in that case the following event data is POST-ed:

{
    "schema": "MobileWikiAppSearch",
    "revision": 15729321,
    "wiki": "enwiki",
    "event": {
        "action":"start",
        "source":0,
        "appInstallID":"6449295c-34c3-4a9f-8a8d-4750479bf808",
        "searchSessionToken":"1c74d24d-1684-4ce5-bcdb-183f07b6357b"
    }
}

Note: the reason source is included in the event data but not in the debug log is because source added by the second call to preprocessData, which happens after the first set of calls to preprocessData.

Verifying[edit]

Thoroughly verifying events from the Alpha and Dev builds requires having multiple SSH connections open to deployment-eventlog05.eqiad.wmflabs for monitoring 4 logs simultaneously:

tail -f /srv/log/eventlogging/client-side-events.log | grep "<app install id>"
tail -f /srv/log/eventlogging/all-events.log | grep "<app install id>"
tail -f /srv/log/eventlogging/systemd/eventlogging-processor@client-side-00.log | grep "<app install id>"
tail -f /srv/log/eventlogging/systemd/eventlogging-processor@client-side-01.log | grep "<app install id>"

client-side-events.log has all incoming events (as raw, encoded URI query strings) regardless of their validity and all-events.log only has events which have been validated against the appropriate schemas. If there are any issues with the incoming events or their validation, there will be detailed messages in the two eventlogging-processor@-client-side-XX logs.

Refer to AE's documentation for more information on EL testing & verification, and see Developer Settings above about obtaining the app install ID.

Miscellaneous[edit]

  • See T189756#4054802 regarding the values of source in feed customization events.
  • Article language switching (if an article is available in multiple languages):
    • In the MobileWikiAppLinkPreview schema, source maps to History Entry enumeration, where language link has an ID of 6.
    • So Link Previews events with source = 6 are interpreted as the user switching between languages the article is available in.

See also[edit]

References[edit]