Research:WikiGrok/Test2

From Meta, a Wikimedia project coordination wiki

Test 2: All logged-in users, en.wiki[edit]

Pre-test (internal QA-only):

1.25wmf11 train deployed to en.wiki
Wed, 10 Dec 2014 19:00:00 UTC / Wed, 10 Dec 2014 11:00:00 PST

Test began:

Config change SWAT deployment ($wgMFEnableWikiGrok set to true)
Fri, 12 Dec 2014 00:41:00 UTC / Thurs, 11 Dec 2014 16:41:00 PST

Test ends:

Config change SWAT deployment ($wgMFEnableWikiGrok set to false)
Thurs, 18 Dec 2014 00:00:00 UTC / Wed, 17 Dec 2014 16:00:00 PST

Sampling[edit]

The test targets all logged-in users on English Wikipedia on mobile devices with screen width less than 768 pixels. At the start of the test, users are randomly assigned to one of two buckets via a token that persists across sessions (clearing the token resets the bucket assignment). The test lasts for 1 week.

Treatments[edit]

Users in the pool of eligible participants see one of two versions of WikiGrok widget when landing on articles where WikiGrok is activated. The start and end of the worflow is identical in the two conditions:

  1. a landing screen with a call to action that the user needs to accept in order to proceed to the next step
  2. a form with a WikiGrok question, the design of which depends on the experimental group the user is assigned to
  3. a confirmation screen, displayed after clicking on the submit button and successfully storing a response (including a "Not sure" or NULL response).

The WikiGrok question is the only element in the workflow that varies across conditions and it consists of two updated types of questions:

Once a user has completed the WikiGrok workflow for a particular article, they will no longer see WikiGrok on that article in the future. The list of articles that they have completed WikiGrok on is stored in LocalStorage.

Claim selection[edit]

The total number of eligible pages is between 260,000 and 300,000.

  • Writer (36444 items)
    • Item eligibility: instance of human, occupation writer, not occupation author
    • Potential claims: occupation author
  • Actor (107047 items)
    • Item eligibility: instance of human, occupation actor
    • Potential claims: occupation television actor, occupation film actor
  • Album (155231 items)
    • Item eligibility: instance of album
    • Potential claims: instance of live album, instance of studio album

Data QA[edit]

  • Data quality issues for test 2 are tracked here.

Results[edit]

Top Level Statistics[edit]

From the 166,888 pages WikiGrok widget could be anabled on, 9173 unique pages had at least one version (a) tested on them. This number is 9013 for version (b). By the end of the test, 6% of these pages had at least one non-null response submitted through them.

The top level session statistics are as follow:

sessions with ... version (a) version (b)
page impression 22,693 21,622
widget impression 11,239 11,145
response 573 570
non-null responses 573
no-thanks 1679 1418
click-accept 732 697
success impression 646 598

The Funnel[edit]

The following two graphs show the funnel for version (a) and version (b) tests. Each node of the graph is labeled by the corresponding widget name in MobileWebWikiGrok schema. The numbers in parantheses show the number of times the widget is used, and the numbers on the connecting arcs show the probability of transitioning from a a widget to the next.

Alt text
Version (a) funnel
Alt text
Version (b) funnel.

Observations

  1. In both versions (a) and (b), page impression does not result in widget impression ~50% of the time. These are from page impressions that do not scroll down in the page enough to see the widget impression. Further experiments with the location of WikiGrok widget in the page can help in identifying the optimal location for the gadget.
  2. In both versions (a) and (b), no interaction is done by the user with WikiGrok widget ~80% of the time the widget is shown. This number is huge and we need to understand why this happens. Is this because the users do not see the widget (UX improvements)? Is this because editors do not find WikiGrok questions interesting? etc. We will carefully monitor this number in the reader experiment.
  3. The users are 2.5 times more likely to choose no-thanks than accepting WikiGrok widget. Whether this number is high or low depends on the traffic each widget receives as well as the desired accuracy of responses. We will keep an eye on this number as we release the feature to readers.
  4. A response is submitted ~90% of the times when WikiGrok widget is accepted. Ideally, we want to push for 100% response submission rate given that the questions are short.

Quality and Predictability of Responses[edit]

The Ground truth for the questions asked by WikiGrok is not known unless handcoded manually. However, we can use entropy to measure the predictability of responses.

Is version (a) statistically different than version (b)?[edit]