Research:Usage of talk pages/2019-11-04

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Work log 2019-11-04[edit]

Main project


Here I want to compare the number of edits of users to article and article-talk pages.

Do more edits to article-talk pages also mean more edits to the article page itself?


We use Mediawiki_history to count the number of edits made by each user to articles (namespace 0) and article-talk pages (namespace 1). We further restrict the query in the following way:

  • snapshot 2019-09 of dewiki, frwiki, and enwiki
  • filter bots (via event_user_is_bot_by, see [1])
  • edits made between 2018-01-01 and 2019-01-01


We show for each user, the number of edits made to an article-talk page (x-axis) and the number of edits made to any article-page (y-axis). Naturally, users vary in their level of activity (the total number of edits they make). Thus, we expect that as a user is more active, she has more edits to both the article-talk and the article-page.

Qualitatively, this is exactly what we see in the data: each dot is a user and more edits to talk-pages (x-axis) generally mean also more edits to article-pages (y-axis). Since dots are all over the place, we would like to trace the general trend: we bin the data according to the value on the x-axis, and calculate the average and the 95%-confidence interval (the range in which 95% of the data is located) for the values on the y-axis. This reveals an intriguing pattern: the average grows very quickly for small values in x and grows much slower for larger values in x (almost saturating). In fact, in the former case, the growth is faster than linear (note that I plotted things on a logarithmic scale):

  • Linear growth (y ≈ x) would mean that as we double x (say from 2 to 4), the value in y also doubles (2 to 4). In general this is what we would expect. As a user becomes more active, she edits equally more the talk and the article pages.
  • Faster than linear (here y ≈ x1.3) means that when we double x (say from 2 to 4), instead of just getting double we get 2.5 times the value in y (2 to 5). Loosely speaking, the increase in the interaction on the talk-page lead to an extra edit for article-pages (on top of the increase we would get from assuming there is a proportionality).

Interestingly, for users with large number of edits, we see the reverse effect:

  • More edits to talk-pages do not translate into equally more edits to article pages, i.e. the curve is much more flat.

The transition between the two cases happens around 20-30 edits to talk-pages. While numbers slightly vary, the general picture is the same across all language.

Note that the dotted curves are drawn by hand and are just intended to guide the eye to see the difference between users with small and large number of edits.

Number of edits to article pages and article-talk pages for enwiki
Number of edits to article pages and article-talk pages for dewiki
Number of edits to article pages and article-talk pages for frwiki


  • for users with few edits, every additional interaction on a talk-page (as measured by the number of edits) yields a larger and larger gain in terms of the number of edits to article pages, i.e. "one edit on a talk-page leads to more than one edit on an article-page". One hypothesis here could be that the talk-page interactions facilitate productivity (or mere activity) on the article-pages.
  • for users with many edits, additional edits to talk-pages do not lead to a substantial increase in the number of edits to article-pages. Perhaps due to time-constraints, more edits on talk-pages actually prevent edits to article-pages.
  • While the analysis is purely correlational and makes no claims in terms of causation, what this suggests is that interactions on talk-pages are particularly important for users with a small number of edits, facilitating continued contributions to article-pages.