Research:Lag between registration and first edit

From Meta, a Wikimedia project coordination wiki
This page documents a completed research project.

This sprint investigates the research question: How long does it take for new users to make an edit once they register an account?

Process[edit]

Data for a registered user's first edit ever -- which includes live and deleted edits -- was generated. This was then compared to the user's registration date. Note: because of legacy installations of MediaWiki, user registration data may be inaccurate prior to 2005. At that time, the software would sometimes record the date of a user's first edit as their registration date. However, this makes up a small percent of users given the massive growth in registration and editors in 2006-7.

The data for all users were then fitted to a Gaussian mixture model, a clustering technique that is able to separate lag observations in several classes (or components). We tried fitting a mixture of N=2,3, and 4 components. Estimation of the parameters of the model is performed via the Expectation Maximization algorithm (EM). The data are first transformed in logarithmic scale (base 10). If data are log-normally distributed, then we should see that the logarithm is distributed according to the normal distribution.

Results[edit]

What percentage of registered users edit?[edit]

Pie Charts[edit]

Histogram with model fit[edit]

Gaussian Mixture Model fit

Mean (days) Median (days) Std. Dev. (days) Prob.
741.5 18.36 2.993e+04 0.2926
0.008591 0.004197 0.01534 0.7074

Data[edit]

Days between reg and first edit Number of users Percent of all users
0 3477450 80.867%
1 146917 3.417%
2 48885 1.137%
3 33918 0.789%
4 28088 0.653%
5 to 10 111996 2.604%
11 to 20 94112 2.189%
21 to 31 59312 1.379%
31 to 60 73512 1.710%
61 to 180 130443 3.033%
180 to 365 95563 2.222%
Total < 1 year 4300196 100.000%
Hours between reg and first edit Number of users Percent of all users Percent of < 1 day users
0 3257914 75.762% 93.687%
1 111753 2.599% 3.214%
2 35798 0.832% 1.029%
3 18451 0.429% 0.531%
4 11214 0.261% 0.322%
5 7382 0.172% 0.212%
6 4881 0.114% 0.140%
7 3518 0.082% 0.101%
8 2631 0.061% 0.076%
9 2451 0.057% 0.070%
10 2278 0.053% 0.066%
11 2255 0.052% 0.065%
12 2200 0.051% 0.063%
13 2068 0.048% 0.059%
14 1972 0.046% 0.057%
15 1864 0.043% 0.054%
16 1777 0.041% 0.051%
17 1561 0.036% 0.045%
18 1503 0.035% 0.043%
19 1307 0.030% 0.038%
20 1061 0.025% 0.031%
21 854 0.020% 0.025%
22 562 0.013% 0.016%
23 195 0.005% 0.006%
Total < 1 day 3477450 80.867% 100.000%
Minutes between reg and first edit number of users Percent of all users Percent of < 1 hour users
0 293625 6.828% 9.013%
1 387565 9.013% 11.896%
2 360452 8.382% 11.064%
3 290431 6.754% 8.915%
4 232709 5.412% 7.143%
5 190312 4.426% 5.842%
6 to 10 588981 13.697% 18.078%
11 to 20 484058 11.257% 14.858%
21 to 30 207025 4.814% 6.355%
31 to 40 111793 2.600% 3.431%
41 to 50 68942 1.603% 2.116%
51 to 60 42021 0.977% 1.290%
Total < 1 hour 3257914 75.762% 100.000%

Future work[edit]

Separate out this data by registration cohort: has this changed over time?