Research:Usage of talk pages/2020-01-03

From Meta, a Wikimedia project coordination wiki

I refine the previous analysis in 2019-11-11 to come closer to a cohort of junior contributors. For this, we consider editors that:

  • registered between 2018-01-01 and 2019-01-01
  • had less than <100 edits in the first 45 days after registration.

We then predict the number of edits made to the main namespace in the 2nd 45 days (target variable) based on the number of edits made to each namespace in the 1st 45 days (features).

In the following we show how the target variable depends on the feature variables for namespace (ns) = 0,1,2,3 using | partial dependence plots. It captures the response of the target variable when changing the feature variable averaged over all other feature variables (i.e. for a linear model we would see a straight line, here we use gradient boosting regression).

Note that, by only looking at junior contributors we ignore users with a large number of edits, such that the overall ability of the model to predict the number of edits to the main namespace in the 2nd half gets worse. The pearson correlation ~0.3 instead of 0.4...0.5).

Results for different wikis[edit]

The strongest predictor for the number of edits to main namespace in the 2nd half (N2_ns_0), unsurprisingly, is the number of edits to the main namespace in the first half (N1_ns_1). This holds across all languages.

Different from the previous analysis, for enwiki we do not observe an inhibiting effect from edits to user talk pages (N1_ns_3). In fact, we find that it has a slightly positive effect.

In addition, we generally find that edits to article talk pages (N1_ns_1) have a larger positive impact. This feature has a larger positive effect in smaller wikis (arwiki, cswiki).

partial dependence plot enwiki
partial dependence plot dewiki
partial dependence plot frwiki
partial dependence plot arwiki
partial dependence plot cswiki
partial dependence plot kowiki