Jump to content

Research talk:Autoconfirmed article creation trial/Work log/2017-12-18

Add topic
From Meta, a Wikimedia project coordination wiki

Monday, December 18, 2017[edit]

Today I'll work on analyzing whether newly registered accounts are more likely to survive if they create articles or drafts during their first week, and to what extent draft creator survival has changed significantly during ACTRIAL.

Survival over time[edit]

The first question I want to answer it to look into how creator survival has changed over time. In order to do that, I reuse the dataset I have of non-autopatrolled article creations to cover the Jan 1, 2009 to July 1, 2017 range. I created a similar dataset of draft creations using the Data Lake. For the period from July 21 to Nov 1, 2017, I grabbed article and draft creations from the page creations table in the log database. Lastly, I grabbed our user activity dataset from the Tools database on Toolforge, which contains information on contributor survival.

After combining and cleaning these datasets, I limited it to only contain article creations done during the first week after registration. I then calculated the proportion of surviving users registered for each day and made a faceted plot, split vertically by namespace (0 is Main, and 118 is Draft), and horizontally by whether the account is autocreated (top) or not (bottom):

There are several general trends to note here:

  1. Survival rate of creators of articles is roughly stable across time, with the exception of during ACTRIAL (which we will discuss in more detail later)
  2. There are some page creations that appear to be in the Draft namespace prior to its Request for Comment and implementation in November/December 2013. I suspect this is due to issues related to historic data, deleted pages, etc… We can see that once the Draft namespace is introduced and starts to be used, the survival proportions stabilize.
  3. The large variations in survival proportion of autocreated accounts that create drafts suggests that very few accounts do that.

While the graph above shows the overall historic trend, limiting the graph to the last few years would enable us to see more clearly what has been going on prior to ACTRIAL, as well as what has happened during ACTRIAL. We limit the dataset to accounts registered on or after Jan 1, 2015, add a vertical line that shows when ACTRIAL started, and get the graph shown below:

Once again, there are several trends to note here:

  1. With the introduction of ACTRIAL, the survival rate of article creators in Main (ns=0) is drastically increased. This is to be expected since accounts have to be autoconfirmed to be able to do so, and those who get past that milestone are more likely to stick around.
  2. The survival rate of autocreated accounts who create drafts appears to drop during ACTRIAL. This is not unexpected. As we saw in the previous plot, the large variation in survival rates suggests that few autocreated accounts created drafts. Looking at the data, we find that prior to the start of ACTRIAL there were no more than half a dozen accounts registered on a given day that created drafts, and only 23 days in total with any such accounts. Once ACTRIAL starts, we find that 14 out of about 45 days have autocreated accounts creating drafts, and there are also more accounts creating them. This is not unexpected, we did anticipate that autocreated accounts would end up not being able to create articles.
  3. The survival rate of non-autocreated accounts creating drafts appears to be more or less unaffected by ACTRIAL.

Differences in survival rate[edit]

Are there significant differences in the survival rate between users who create drafts and those who create articles? We limit the dataset from Jan 1, 2015 to July 1, 2017, giving us 1.5 years prior to ACTRIAL. Then we create a 2x2 contingency matrix:

Non-survivor % Survivor % Row total %
Main 247,706 96.1% 10,029 3.9% 257,735 100.0%
Draft 41,073 95.0% 2,153 5.0% 43,226 100.0%
Total 288,779 96.0% 12,182 4.0% 300,961 100.0%

The survival rate in Main is 3.9%, while in Draft it is 5%. We use a Chi-square goodness-of-fit test to compare the survival rate of Draft with that in Main, and find that it is a statistically significant difference: X2=667.1, df=1, p << 0.001.

Next we ask if the survival rate has changed during ACTRIAL. We disregard the Main namespace because it appears that the answer is obviously "yes" given that the underlying population has changed in such a significant way.

When it comes to the Draft namespace, we approach this by looking at data from Sept 15 to Nov 1. In order to account for seasonal differences, we grab data from the same time period in 2016 and 2015 to use as our "Pre-ACTRIAL" comparison. This gives us the following 2x2 contingency matrix:

Non-survivor % Survivor % Row total %
Pre-ACTRIAL 4,082 94.8% 226 5.2% 4,308 100.0%
ACTRIAL 9,816 95.4% 470 4.6% 10,286 100.0%
Total 13,898 95.2% 696 4.8% 14,594 100.0%

Here we find that the survival rate during ACTRIAL is 4.57%, whereas during the same time period in the two years prior it was 5.24%. It is also worth noting that there are almost 2.5 times as many non-surviving accounts during ACTRIAL as there was in total over the two years prior, which shows the shift in creations from Main to Draft during ACTRIAL. The drop in survival rate is also significant. We use the same approach as before and compare the ACTRIAL survival rate using the pre-ACTRIAL survival rate and find that it is statistically significant (X2=9.48, df=1, p=0.002).