Jump to content

Research talk:Reading time/Work log/2018-10-07

Add topic
From Meta, a Wikimedia project coordination wiki

Sunday, October 7, 2018

[edit]

Model Selection

[edit]

Here is a table used to illustrate the model selection process. We fit each of 5 distributions to a sample of views from each wiki and compute goodness of fit criteria. Each of the models is fit on a 75% sub-sample (training set) and the goodness of fit criteria are computed using the other 25% (test set).

AIC is the Akaike information criterion , BIC is the Bayesian information criterion. These two criteria are pretty similar (lower is better) and attempt to quantify the amount of information lost by the model. The main difference is that BIC takes sample size into account. KS is the Kolmogorov-Smirnov statistic, which may not be so useful for large sample sizes (it rejects the null that the model is the "true model").

The table shows results on a handful of selected wikis. The lognormal model appears to be a good fit, outperforming the Weibull model (weibull_min). The exponentiated Weibull model also performs well, but may be difficult to interpret. Next we will fit the models on all the wikis to better evaluate which models are a good fit.

wiki model AIC BIC ks loglik rank (BIC)
0 dewiki weibull_min 1.151329e+04 4.004192e+04 1.000000 -5.754643e+03 3.0
1 dewiki exponweib 1.109909e+04 3.859606e+04 1.000000 -5.546544e+03 1.0
2 dewiki lognorm 1.110096e+04 3.860755e+04 1.000000 -5.548482e+03 2.0
3 dewiki gamma 1.232521e+04 4.286638e+04 1.000000 -6.160603e+03 4.0
4 dewiki expon 1.496725e+04 5.206236e+04 1.000000 -7.482627e+03 5.0
5 enwiki weibull_min 1.712008e+04 5.923140e+04 1.000000 -8.558042e+03 3.0
6 enwiki exponweib 1.052603e+04 3.640883e+04 1.000000 -5.260014e+03 1.0
7 enwiki lognorm 1.054560e+04 3.648146e+04 1.000000 -5.270798e+03 2.0
8 enwiki gamma 2.454977e+06 8.495036e+06 1.000000 -1.227487e+06 4.0
9 enwiki expon 9.267797e+06 3.206969e+07 1.000000 -4.633898e+06 5.0
10 pawiki weibull_min 7.399301e+03 2.424868e+04 0.988747 -3.697651e+03 3.0
11 pawiki exponweib 6.876295e+03 2.252950e+04 0.999188 -3.435147e+03 2.0
12 pawiki lognorm 6.801278e+03 2.228812e+04 0.999755 -3.398639e+03 1.0
13 pawiki gamma 9.181350e+03 3.009093e+04 0.730970 -4.588675e+03 4.0
14 pawiki expon 1.895372e+04 6.213312e+04 0.999999 -9.475861e+03 5.0
15 nlwiki weibull_min 1.103810e+04 3.835726e+04 1.000000 -5.517048e+03 3.0
16 nlwiki exponweib 1.064092e+04 3.697176e+04 0.999966 -5.317459e+03 1.0
17 nlwiki lognorm 1.066475e+04 3.705955e+04 0.999993 -5.330375e+03 2.0
18 nlwiki gamma 1.166566e+04 4.053862e+04 1.000000 -5.830832e+03 4.0
19 nlwiki expon 1.341853e+04 4.663632e+04 1.000000 -6.708263e+03 5.0
20 eswiki weibull_min 1.245377e+04 4.339014e+04 0.999097 -6.224885e+03 3.0
21 eswiki exponweib 1.176727e+04 4.099280e+04 0.999977 -5.880635e+03 2.0
22 eswiki lognorm 1.176492e+04 4.098958e+04 0.999897 -5.880461e+03 1.0
23 eswiki gamma 1.498942e+04 5.222664e+04 0.749022 -7.492712e+03 4.0
24 eswiki expon 2.794001e+04 9.736306e+04 0.999999 -1.396901e+04 5.0
25 hiwiki weibull_min 1.018607e+04 3.496911e+04 1.000000 -5.091036e+03 3.0
26 hiwiki exponweib 1.001290e+04 3.436956e+04 0.999519 -5.003449e+03 2.0
27 hiwiki lognorm 1.000967e+04 3.436335e+04 0.999625 -5.002835e+03 1.0
28 hiwiki gamma 1.032541e+04 3.544760e+04 1.000000 -5.160707e+03 4.0
29 hiwiki expon 1.049624e+04 3.603908e+04 1.000000 -5.247119e+03 5.0
30 arwiki weibull_min 2.439485e+04 8.414907e+04 1.000000 -1.219543e+04 3.0
31 arwiki exponweib 1.083212e+04 3.735461e+04 1.000000 -5.413059e+03 2.0
32 arwiki lognorm 1.075584e+04 3.709637e+04 1.000000 -5.375922e+03 1.0
33 arwiki gamma 2.492751e+06 8.599635e+06 1.000000 -1.246373e+06 4.0
34 arwiki expon 8.934334e+06 3.082221e+07 1.000000 -4.467166e+06 5.0