User:Leaderboard/StewardMark

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Noto Emoji Pie 1f4c4.svg (English) This is an essay. It expresses the opinions and ideas of some Wikimedians but may not have wide support. This is not policy on Meta, but it may be a policy or guideline on other Wikimedia projects. Feel free to update this page as needed, or use the discussion page to propose major changes.
Translate

If it isn't obvious, StewardMark is not an official Meta-Wiki policy (or indeed that of any wiki as far as I am aware.

StewardMark is a experimental scoring system that ranks the performance of each steward candidate using a model that is nearly the same as the support percentage that is currently used to determine whether a candidate passes, and scales across multiple steward years. The model only considers steward election from 2009, as the voting population of prior years is harder to compare.

Calculation[edit]

  • Let x be the number of supports received by a candidate.
  • Let y be the number of opposes received.
  • Let z be the number of neutral votes received.

Then the StewardMark Sm of a candidate is defined by

The key difference is that some weightage is given to neutral users, because I believe that their opinions should also count. For most candidates this will mean that Sm < support %, and will mean the other way round for the rest.

StewardMark only applies to users that have not withdrawn or been disqualified.

Standardisation[edit]

This can be used to compare with scores from other contexts (say RfA scores from Wikipedia). A conversion table should be defined in any case. The "standardised" scale is a real number from 0 to 20, rounded to two decimal place.

The US grade system equivalent is meant to answer this question: If stewardship was a course and the election determined your grade, what would it be? Just like a real college course, C is a bad grade and such students often have to retake, and this is shown in the below tables, where passed candidates usually have a reasonably high grade.

Conversion scale
Standardised scale (0 - 20) StewardMark cutoff (/100) US grade system equivalent
20 99.5 A+
19 96.5
18 93 A
17 90
16 86 A-
15 81 B+
14 77 B
13 73 B-
12 67 C+
11 60 C
10 54 C-
9 45 D+
8 40 D
7 35 D-
6 29 F
5 22
4 16
3 11
2 7
1 2
0 0

Statistics[edit]

The dataset includes all steward candidates from 2009 and later. Data correct as of the 2021 steward elections

Comparison of statistical parameters
Statistical parameter StewardMark (/100) StandardScale (/20)
Mean 67.16 12.83
Median 77.42 14.11
Maximum 99.39 19.96
Minimum 2.76 1.15
Standard deviation 28.11 5.19
Steward election stats over the years
Year StewardMark mean StandardMark mean Number of candidates StandardMark Stdev
2009 67.27 12.69 22 4.86
2010 48.32 9.49 25 6.72
2011 71.10 13.75 20 5.39
2012 83.24 15.52 9 1.80
2013 68.81 13.51 10 6.38
2014 82.29 15.67 10 3.70
2015 73.87 13.94 14 4.07
2016 62.74 11.61 10 2.43
2017 69.76 13.09 7 4.07
2018 70.04 13.15 10 4.27
2019 74.87 14.19 7 4.64
2020 66.16 12.81 14 5.36
2021 61.23 11.92 10 5.89

Raw data[edit]

See Raw data.

Takeaways[edit]

  • There are some steward candidates that have done really well, especially MF-Warburg (the topper). When setting the conversion scale, one objective was to design in such a way that it would be extremely, but not impossibly, difficult to get a perfect standardised score of 20. MF-Warburg came incredibly close to that with a StewardMark of 99.39/100.
  • The skew implies that most steward candidates do pretty well - about 50% of the candidates in the dataset passed.
  • There are a couple of cases where someone with a higher StewardMark (for example, 2009's Putnik with a 77.73/100) has failed than someone else who passed. The reason is that the former had fewer neutrals: the latter might have just crossed the 80% support ratio but garnered more neturals that would drag down the score. They are rare though.

StewardMark from a en.wp perspective[edit]

A natural question would be to analyse the suitability of StewardMark when analysing en.wp adminship, giving the large number of candidates that have attempted for adminship. There are some important differences however:

  • We must include withdrawn and SNOW cases, as they comprise a significant number of candidates.
  • The results are different. For instance, about 3% of all candidates score a 100/100 StewardMark, and hence get a 20. On the other hand, mainly as a result of SNOW, one-eighth of all candidates get a zero. These extremes should be taken into account, and even then, en.wp adminship proposals score very well on the high end as compared to stewards.

The raw data for en.wp is available at User:Leaderboard/StewardMark/en.wp RFA raw data. Data last updated: Jan 2021.

en.wp StewardMark statistics
Statistical parameter StewardMark (/100) StandardScale (/20)
Mean 52.01 10.15
Median 53.02 9.89
Maximum 100 20
Minimum 0 0
Standard deviation 36.04 6.85
en.wp stats over the years
Year StewardMark mean StandardMark mean Number of candidates StandardMark Stdev
2008 50.66 9.90 591 6.92
2009 50.32 9.80 354 6.75
2010 47.76 9.42 231 6.69
2011 51.94 10.13 139 6.88
2012 46.64 9.05 95 6.73
2013 59.90 11.62 74 6.28
2014 51.35 10.02 62 7.65
2015 52.63 10.18 58 6.48
2016 57.22 11.08 36 7.39
2017 65.48 12.82 41 6.72
2018 66.80 12.92 18 6.84
2019 76.89 14.82 31 5.02
2020 74.28 14.28 24 5.77
2021 54.39 11.05 2 10.44