Research:Article feedback/Data and metrics
This page describes the data and metrics plan for the project Article Feedback Tool Version 5 (AFT V5).
Analysis of data collected via the Article Feedback Tool (v.5) will be conducted in parallel to the testing, deployment and continuous improvements of different AFT designs. Research will be organized in 4 different stages:
we will measure the immediate effects of different AFT designs on:
- the volume of feedback readers submit
- immediate reader engagement
- we will study the quality of feedback generated by different AFT designs, both quantitatively and qualitatively, by relying on community assessment
- In we will study the effects of different placements of the widget on the volume and quality of feedback collected, by applying the same metrics and qualitative analysis used in stage 1.
we will study the effects of AFT on editing activity.
- we will study to what extent AFT affects a reader's propensity to edit a page
- we will measure how effective calls to action are at engaging users into completing an edit after providing feedback
- we will also analyze the quality of edits produced via AFT, measured in terms of short-term survival and revert probability of these edits,
- In we will measure the satisfaction of editors and raters with the tool and the data it produces. Data will be collected via a short post-feedback survey as well as a more detailed survey to be run at a later time.
- In we will study the effects of notifications sent upon promotion of a comment to an article's talk page on recruitment. We will also study the retention of users effectively recruited via AFT. We will also measure the long-term survival of edits generated via AFT.
The following are the main research questions we will address across the 4 stages.
- U1 How many unique users posted a rating?
- U2 How many unique users posted a comment?
- U3 How many users posted feedback on more than one article?
- E1 How many people shared their email?
- E1 How many users attempted to edit via the widget?
- E3 How many users completed an edit via the widget
- E4 How many edits produced via AFT were immediately reverted?
- E5 Does displaying the widget result in an overall increase in anonymous edits?
- E6 How effective are CTA at engaging readers to edit?
- E7 How many edits produced via AFT were later removed by other editors?
- E8 How many users responded to a notification of feedback featured/promoted to the talk page?
- E9 How many users were effectively recruited as community members via AFT?
- S1 Did readers find the tool useful?
- S2 Did editors find the tool useful?
- Q1 How rich is feedback provided by AFT? (measured via quantitative metrics)
- Q2 How did reader ratings compare to editor ratings?
- Q3 How many feedback posts were voted up or down by the community?
- Q4 How many feedback posts were featured or hidden by editors?
- Q5 How useful is feedback collected via AFT as a function of its design? (based on community assessment of the quality of feedback collected)
- Q6 How useful is feedback collected via AFT as a function of its placement? (based on community assessment of the quality of feedback collected)
We will store data collected via AFT along with metadata about the article being commented, the user and the A/B testing bucket. See this page for a description of data collected up to v.4)
Clicktracking of UI events
We will build on the existing clicktracking functionality to measure hourly aggregates of clicks on different UI elements that are likely to drive away users and compare them to the number of submits to quantify incomplete transactions. Depending on the design displayed to the user, these elements will include:
- Clicks on Yes/No buttons (option 1)
- Clicks labels and tabs (option 2)
- Clicks on rating stars (option 3)
- Clicks on submit button (all options)
- Clicks on help button (all options)
See clicktracking overview page for a detailed list of all events tracked as of the launch of AFT5.
Building on data collected in v.4, we will extract aggregate counts of clicks on UI elements related to the call to edit. We will track the number of attempts to edit and completed edits in (option 4) as well as the number of attempts to edit and successfully completed attempts via the edit CTA in all other options. We will also mark edits completed in each option with a specific flag to be able to perform a survival/revert analysis in and .
We will extract edit counts by different classes of users (anonymous and registered) from the revision log of articles being tested.
The full specs of clicktracking data can be found here.
- Number of posts submitted
- Number of posts submitted with a comment
- Percentage of users submitting feedback on multiple articles
- Breakdown of feedback posted by user class (readers vs registered editors)
- Number of daily impressions
- Daily conversion rate (percentage of daily impressions)
- Number of attempts to edit via the edit tab
- Number of attempts to edit via a section edit link
- Number of edits completed via the edit tab
- Number of edits completed via a section edit link
- Number of edits completed via a CTA (option 1-2-3)
- Number of edits completed via direct call to edit (option 4)
- Number of anonymous readers who submitted feedback
- Number of registered users who submitted feedback
- Satisfaction metrics from post-feedback survey
- Satisfaction metrics from follow-up survey
- Overall length of comments submitted
- Proportion of comments voted up or down on the feedback page
- Proportion of comments hidden on the feedback page
- Proportion of comments promoted to the talk page
AFT test plan
We will test the effects of AFT design, placement and impact on edits separately:
- we start by a/b testing design option 1 (share feedback) vs option 2 (make suggestion) vs option 3 (review)
we select a winning design based on volume and quality of feedback posted
- we test the effects of placement of the winning design option from the previous test
we select a winning placement based on quality and volume of feedback posted
- we test the effects of AFT on the volume of edits it generates by comparing (a) the winning design from the previous test combined with an edit CTA vs. (b) a direct call to edit (option 4) vs. (c) a control condition with no widget displayed (option 0)
we make a final decision informed by the observed impact of AFT on edits
- each of the above tests requires at least (but no more than) 3 user groups
- preliminary placement options to be tested have been specified in our feature requirements page
- we will run the three tests sequentially, not in parallel. This should not affect the schedule of the developers, as each of the above designs will be designed regardless of when we decide to activate them
- for simplicity, we dropped the AFT v.4 design from the first test
- edit survival analysis has been included in the final test as we can measure reverts for vandalism/spam within a few days (or sometime hours) of an edit being completed
- the post-feedback Edit Call-to-Action will be displayed to all buckets since the very beginning, so as to collect early data on engagement/conversions
- Stage 1 as described above corresponds to Phase 1.0 deliverables in our feature requirements page
- Stages 2 and 3 as described above correspond to Phase 1.5 deliverables in our feature requirements page