Research talk:Autoconfirmed article creation trial/Work log/2018-02-12

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

Tuesday, February 13, 2018[edit]

Today I'll wrap up the related measure for H5 so that it's completed.

H5: Further segmentation[edit]

We wish to investigate how ACTRIAL affect newly registered users who create articles, in comparison to those who start out by editing existing content. In our December 18 work log, we started diving into this data by looking at survival for newly registered accounts that created articles and/or drafts. Because "surviving editor" is defined as someone who edits in both weeks 1 and 5 after registering, we limited our dataset to accounts that created an article or draft in the first week. In our December 11 work log, we looked at how quickly after registering article and draft creations occur for those who create one during their first 30 days, finding that generally, creation happens within the first 24 hours. Focusing on accounts who create an article or draft in the first week therefore captures most of that creation, in addition to fitting with the definition of "surviving editor" as noted above.

Because ACTRIAL limits article creations, we focused our analysis on the Draft namespace. In our initial analysis, we found a significant decrease in survival rate during the first 1.5 months of ACTRIAL. We updated that result on January 17 based on the hypothesis that accounts creating pages in the Draft namespace during ACTRIAL could be seen as a combination of the same prior to ACTRIAL plus a random sample of accounts creating articles prior to ACTRIAL. We then find that the survival rate of those who create articles/drafts during their first week is unchanged during ACTRIAL.

Now we are interested in understanding how ACTRIAL affects those that do not create an article or draft. We examine both accounts that do not create an article/draft in their first week, and those that do not create an article/draft in their first five weeks, thus spanning both weeks found in the definition of a "surviving editor". We first look at a historical plot of those who do not create an article/draft during their first week:

Prop surviving noncreators week1 2009-2017.png

Generally, the plot above looks fairly similar to the survival plot when we analyzed overall survival on February 7. The survival rate of autocreated accounts appears to be fairly stable across time, there is not a clear pattern emerging. For non-autocreated account, the pattern of increased survival during the fall months is something worth noting as that is likely to affect our results. Secondly, there also appears to be an increase in the survival rate shortly after the new year, although perhaps less pronounced.

Focusing in on the two most recent years and adding a line that shows the start of ACTRIAL can provide some further insight:

Prop surviving noncreators week1 2016-2017.png

We can again see a large amount of variation and no apparent pattern in the data for autocreated accounts. For non-autocreated accounts, there are again the patterns of increased survival rate in fall and right after the new year. We can also see a strong increase in survival for accounts registered shortly before ACTRIAL starts. The pattern during ACTRIAL appears to largely echo the same from 2016, though.

Historical plot for accounts not creating articles/drafts during their first five weeks.
2016–2017 plot for accounts not creating articles/drafts during their first five weeks.

Limiting the data to only accounts that do not create an article/draft during the first five weeks after registration results in no significant difference. The patterns found in the graph for non-autocreated accounts are the same, and the non-pattern in the graph for autocreated accounts remains.

We next create four 2x2 contingency matrices, one for each time span of no article/draft creations (first week and first five weeks), and one for each type of account creation (autocreated and non-autocreated). The matrices are as follows:

Autocreated accounts, no draft/article creations during the first week:

Non-survivor  % Survivor  % Row total  %
Pre-ACTRIAL 18,990 97.1% 563 2.9% 19,553 100.0%
ACTRIAL 3,866 96.5% 139 3.5% 4,005 100.0%
Total 22,856 97.0% 702 3.0% 23,558 100.0%

Non-autocreated accounts, no draft/article creations during the first week:

Non-survivor  % Survivor  % Row total  %
Pre-ACTRIAL 424,059 97.7% 10,009 2.3% 434,068 100.0%
ACTRIAL 84,376 97.0% 2,645 3.0% 87,021 100.0%
Total 508,435 97.6% 12,654 2.4% 521,089 100.0%

Autocreated accounts, no draft/article creations during the first five weeks:

Non-survivor  % Survivor  % Row total  %
Pre-ACTRIAL 20,281 97.0% 628 3.0% 20,909 100.0%
ACTRIAL 4,021 96.4% 149 3.6% 4,170 100.0%
Total 24,302 96.9% 777 3.1% 25,079 100.0%

Non-autocreated accounts, no draft/article creations during the first five weeks:

Non-survivor  % Survivor  % Row total  %
Pre-ACTRIAL 444,137 97.6% 10,783 2.4% 454,920 100.0%
ACTRIAL 86,333 96.9% 2,772 3.1% 89,105 100.0%
Total 530,470 97.5% 13,555 2.5% 544,025 100.0%

Similarly as we did for draft creators, we use a Chi-square goodness-of-fit test to compare the ACTRIAL survival rate against the pre-ACTRIAL survival rate, using the latter as the "expected rate". The results are as follows:

User group X2 P-value
Autocreated accounts, first week 5.01 0.025
Non-autocreated accounts, first week 207.91 << 0.001
Autocreated accounts, first five weeks 4.64 0.031
Non-autocreated accounts, first five weeks 211.21 << 0.001

For autocreated accounts, the survival rate is about 0.5% higher than similar periods of the five years prior to ACTRIAL, and this is a statistically significant increase. For non-autocreated accounts, we find an increase of 0.7% in both cases, and this is also a statistically significant increase. This suggests that survival is higher during ACTRIAL. As we discussed in our February 7 work log, further analysis is needed to understand to what extent ACTRIAL is causing this change, or whether there are other factors, given that we have seen increased retention in the fall of previous years.