Research talk:Autoconfirmed article creation trial/Work log/2017-08-17

From Meta, a Wikimedia project coordination wiki

Thursday, August 17, 2017[edit]

Today I'll continue some of the data analysis based on the historical data that we have.

Median time to reach autoconfirmed status[edit]

H4 hypothesizes that the median time to reach autoconfirmed status will not change. We've gathered data on when users who got autoconfirmed reached the threshold (at least ten edits and an account that's at least four days old). Calculating the median time to autoconfirmed by day of account registration results in the following plot:

As we can see, users who eventually reach autoconfirmed status will most likely reach the ten edit threshold in less than four days. This is mainly driven by non-autocreated accounts, because when we split the plot by type of account creation, we can easily see that autocreated accounts have a higher median time to reach autoconfirmed status:

The plot above would suggest that users who come to English Wikipedia from other wikis, and who make at least ten edits in the first 30 days, reach that threshold by making scattered edits. Those who create an account regularly and go on to make ten edits, are more likely to do it quickly. This might be driven by autocreated accounts being able to use the preview button?

Proportion of surviving new editors[edit]

H5 is interested in the proportion of surviving new editors, where a surviving editor is defined as one who makes at least one edit in their first week and at least one edit in their fifth week after registration. Historically, this proportion looks like this:

In yesterday's work log, we plotted the proportion of accounts that made at least one edit, and it looks very different from the plot above. Based on the survival plot, it appears that the survival rate might be somewhat lower recently than what is was back in 2009.

H5 is also interested in whether a user starts out by creating an article or not. That data is forthcoming, and we'll analyze it when we have it. In the meantime, we split the plot by type of account creation. The first plot below calculates the proportion of surviving editors based on total account registrations, which we know leads to a downwards slope for autocreated accounts because we saw in yesterday's analysis that they are less likely to make any edits. In the second plot, we calculate the surviving proportion based on those who make at least one edit in the first week.

Once we separate the two types of accounts, we see that the survival rate of non-autocreated accounts is fairly stable around 2.5%. The proportion of autocreated accounts fluctuates a lot more, but appears to be higher, around 4–5%. Similarly as we saw for the proportion of non-autocreated accounts reaching autoconfirmed status at the end of yesterday's work log, we appear to have a stable measurement in the proportion of non-autocreated accounts that can be labelled "surviving editor" based on our definition.

Diversity of participation[edit]

H6 hypothesizes that the diversity of participation will be unchanged. We measure diversity of participation through the mean number of namespaces edited, as well as the mean number of pages edited. Overall, the plots of both measurements are very similar in shape to others we have seen before, such as the proportion of accounts with non-zero edits. Let's plot the average number of namespaces first:

The first plot above is the average number of namespaces for all registered accounts, and this plot is very similar in shape to the plot showing proportion of accounts with non-zero edits (in the Aug 16 work log). The second plot shows the same measurement, but separated by how the account was created. This plot is also very similar to the one showing proportion of accounts with non-zero edits (also in the Aug 16 work log).

If we instead restrict it to accounts that made at least one edit, the shapes change but again look similar to plots we have seen before:

These plots are quite similar to the ones showing the proportion of accounts making at least one edit that reach autoconfirmed status, although some of the variations appear to be a bit larger.

When measuring the average number of edited articles, we use a geometric mean because that accounts for the skewed distribution of number of pages edited. Again, we get plots that are similar in shape to what we have seen before. First, plots of overall averages:

As we discussed already, these look very similar to other plots. We see the same effect if we calculate proportions by looking at users who make at least one edit:

The Y-axis is different, but the overall trend is similar to what we've seen earlier, both overall and when we split the graph into autocreated accounts and the others.

Number of edits in the first 30 days[edit]

H7 concerns the average number of edits in the first 30 days. We calculate this using a geometric mean similarly as we did for number of pages edited due to skewness in this number. What we find is that this measure exhibits similar behaviour as the ones we've already seen. If we look at average number of edits for all accounts, the plots are as follows:

These plots look more or less just the same as the ones we've seen previously. If we instead calculate the mean for accounts that made at least one edit, we also find similar patterns as we have already seen:

In summary: we have several measures of user activity that appear to be fairly stable for users that make at least one edit: proportion of accounts reaching autoconfirmed status within 30 days, proportion of surviving new editors, average number of pages edited, and average number of edits. The remaining trends are largely determined by whether an account makes an edit or not.