Jump to content

Days since first edit

From Meta, a Wikimedia project coordination wiki

If you randomly sample revisions from the English Wikipedia, and find out how long each editor has been with the project, what is the trend? Here is the average of 222773 such samples.

In detail: samples were chosen by taking one edit per 10 minute period, in particular the earliest edit after the start of the period. Duplicate samples chosen in this manner were removed. For each edit, the user_registration field is consulted. This is the date of first edit accounts created before December 25, 2005, and the actual date of registration for edits after it. A monthly average was then calculated.

Here is the source data for the graph:

Period  Days            Sample size
Jan-02  96.66948564     3117
Feb-02  109.1223501     2973
Mar-02  119.9344065     3797
Apr-02  143.5780189     3569
May-02  162.827694      3182
Jun-02  155.5267995     3532
Jul-02  193.36359       3589
Aug-02  211.2267589     4380
Sep-02  208.2957376     4256
Oct-02  181.4799633     4410
Nov-02  203.0105795     4222
Dec-02  190.9854214     4355
Jan-03  226.6011664     4405
Feb-03  229.0190045     3867
Mar-03  232.1456002     4367
Apr-03  228.5744291     4160
May-03  246.7534909     4390
Jun-03  284.5185389     4311
Jul-03  272.2861387     4388
Aug-03  270.176214      4450
Sep-03  286.1736725     4221
Oct-03  274.4471518     4207
Nov-03  274.7829859     4311
Dec-03  257.9103377     3855
Jan-04  248.3024366     4410
Feb-04  265.1058562     4156
Mar-04  254.8133977     4450
Apr-04  248.8855159     4299
May-04  255.5882108     4382
Jun-04  264.8154059     4052
Jul-04  241.102758      4460
Aug-04  248.9399052     4460
Sep-04  277.7529204     4319
Oct-04  278.9730357     4444
Nov-04  277.671123      4320
Dec-04  311.1636362     4434
Jan-05  294.0872911     4444
Feb-05  314.9714574     3878
Mar-05  300.7735358     4369
Apr-05  294.9082006     4318
May-05  311.2628278     4459
Jun-05  294.7128944     4118
Jul-05  298.4991302     4462
Aug-05  290.3381185     4460
Sep-05  299.6622528     4301
Oct-05  311.4702595     4464
Nov-05  301.4848687     4320
Dec-05  310.3590844     4464
Jan-06  321.7717266     4464
Feb-06  301.4229457     4002
Mar-06  283.9253029     4455
Apr-06  301.9083951     4255

Model

[edit]

In English Wikipedia the number of contributors is still growing exponentially (in contrast to the German Wikipedia, see [1]). So you can model the number of contributors at time t with:

If you assume that each contributor makes one edit per timestep then the sum of the age of all edits is the integral of users:

So the average number of days since the first edit of the contributor of an average edit was:

But the number of edits per user is distributed very unevenly, so you get a more logarithmic growth (but how?)

Histogram

[edit]

The number of days is calculated in bins of 10, so 100 days is 95-105. The distribution can be modeled () with

with and . Vice versa you can calculate the frequency of edits with a given age of its contributor (by number of days since its first edit) with

with .

Around 25% of all edits are done by users that had their first edit less than 100 days ago.

This statistic does not hold any information about the quality of edits.