Wikimedia Foundation elections/Board elections/2008/Planning/Schulze method

From Meta, a Wikimedia project coordination wiki
This is an archived committee draft for the June 2008 board elections. Please do not significantly change the content.
See the planning pages index for links to the current pages.
2008 board elections:
Planning pages

I was trying to get my head around how votes care tallied up using the Schulze method, which is presumably going to need to be explained to any developer who is set the task of coding an extension to take care of it. So I set about making an example.

To start with, we need to sort out the rule for a valid ballot, which are, as I see them:

  • Voters can put in any positive number, zero or leave blank, beside any candidate
    • The purpose behind that rule is that as far as I see it, with the Schulze method it is the rank that is important, not the actual numbers allocated.
    • Also, that takes into account the generally agreed-upon rule that voters are able to leave some candidates unranked, and can allocate more than one candidate the same preference.
    • A vote of zero given to a candidate would be interpreted as being the same as not ranking them at all.
  • Voters must provide a rank for at least one candidate

So, for example, the following would not be a valid ballot:

1 Candidate 1
2 Candidate 2
-1 Candidate 3
3 Candidate 4
5 Candidate 5

The reason the above ballot would not be valid is because of the inclusion of a negative number, which has an unclear meaning in a system where a lower positive integer wins.

The following ballot, on the other hand, would be valid:

17.5 Candidate 1
2 Candidate 2
  Candidate 3
7 Candidate 4
  Candidate 5

Although the voter has used strange numbers, it is the rank of the candidate relative to the other candidates on that ballot that is important, not the actual figures themselves. For now, let's assume a more common sort of completed ballot has been supplied:

2 Candidate 1
3 Candidate 2
1 Candidate 3
  Candidate 4
2 Candidate 5

We go through the ballot from the top to the bottom and compare each ranking provided with the rankings provided for the other candidates on that ballot.

We now compare the ranking given to Candidate 1 (2) to the other candidates:

  • Candidate 1 (2) is ranked higher than Candidate 2 (3)
  • Candidate 1 (2) is not ranked higher than Candidate 3 (1)
  • Candidate 1 (2) is ranked higher than Candidate 4 (unranked)
  • Candidate 1 (2) is not ranked higher than Candidate 5 (2)

Using these results, we can fill out the results table with the wins for Candidate 1. This table is designed so that a '1' in one of the cells represents a win for the candidate at the top of that column over the candidate at the left of the row:

Candidate C1 C2 C3 C4 C5
C1          
C2 1        
C3          
C4 1        
C5          

Having finished Candidate 1, we move on to look at the voter's relative ranking of Candidate 2:

We now compare the ranking given to Candidate 1 (2) to the other candidates:

  • Candidate 2 (3) is not ranked higher than Candidate 1 (2)
  • Candidate 2 (3) is not ranked higher than Candidate 3 (1)
  • Candidate 2 (3) is ranked higher than Candidate 4 (unranked)
  • Candidate 2 (3) is not ranked higher than Candidate 5 (2)

We add this result to the table:

Candidate C1 C2 C3 C4 C5
C1          
C2 1        
C3          
C4 1 1      
C5          

Candidate 3 (1) is ranked higher than all of the other candidates, so we add this to the table:

Candidate C1 C2 C3 C4 C5
C1     1    
C2 1   1    
C3          
C4 1 1 1    
C5     1    

Candidate 4 (unranked) has not been given a rank, so there is no change to make to the results table.

Finally, we compare the rank given to Candidate 5 (2) to the ranks given to the other candidates:

  • Candidate 5 (2) is not ranked higher than Candidate 1 (2)
  • Candidate 5 (2) is ranked higher than Candidate 2 (3)
  • Candidate 5 (2) is not ranked higher than Candidate 3 (1)
  • Candidate 5 (2) is ranked higher than Candidate 4 (unranked)

We add these results to the table:

Candidate C1 C2 C3 C4 C5
C1     1    
C2 1   1   1
C3          
C4 1 1 1   1
C5     1    

After repeating that for each ballot and generating such a table for each, we can overlay all of the results tables of all the voters over the top of one another, and work out the sum of each cell. Here is an example results table I made up, assuming about 3500 voters:

Candidate C1 C2 C3 C4 C5
C1   985 147 654 359
C2 457   752 209 1234
C3 1652 986   781 1109
C4 752 431 178   875
C5 213 79 561 312  

Armed with the results, we do pairwise comparisons of each possible combination of candidates. The percentages are the number of pairwise wins divided by the total number of ballots. The reason the percentages don't add up to 100% is because voters can be indifferent between two candidates, either by leaving them unranked, or by giving them the same preference.

  • Candidate 1 (13.1%) loses to Candidate 2 (28.1%)
  • Candidate 1 (47.2%) beats Candidate 3 (4.2%)
  • Candidate 1 (21.5%) beats Candidate 4 (18.7%)
  • Candidate 1 (6.1%) loses to Candidate 5 (10.3%)
  • Candidate 2 (28.2%) beats Candidate 3 (21.5%)
  • Candidate 2 (12.3%) beats Candidate 4 (6.0%)
  • Candidate 2 (2.3%) loses to Candidate 5 (35.3%)
  • Candidate 3 (5.1%) loses to Candidate 4 (22.3%)
  • Candidate 3 (16.0%) loses to Candidate 5 (31.7%)
  • Candidate 4 (8.9%) loses to Candidate 5 (25%)
Candidate C1 C2 C3 C4 C5
Pairwise results (wins-ties-losses) 2-0-2 3-0-1 0-0-4 1-0-3 4-0-0
Lowest percentage out of all pairwise defeats 6.1% 35.3% 4.2% 6.0% N/A

I'm not sure about what that last row there should be, because the example given in the Wikipedia article seems to assume that all voters must rank all candidates. There may be situations where the lowest percentage given to a candidate is a pairwise combination which they win, like 7% vs 6%. Two quite unpopular candidates. The reason for the lowest percentage information is because it is used to drop people out of the race.

So what do we do now to arrive at a ranked list of candidates? Do we rank them according to the number of pairwise wins they have received?

Ranking of candidates:

  1. Candidate 5 (4-0-0)
  2. Candidate 2 (3-0-1)
  3. Candidate 1 (2-0-2)
  4. Candidate 4 (1-0-3)
  5. Candidate 3 (0-0-4)

That seems fairly simple with this small number of candidates and no ties between them. But is it really accurate? Looking at that ranking without regard to the counts above, what if the 2 candidates which Candidate 1 had beaten were in fact the two candidates ranked above him/her?

With these results, when selecting a single winner, Candidate 5 would be the Schwartz Set, according to the Wikipedia article. But selecting multiple winners is clearly more murky.

Let's pretend for a moment that the pairwise results of an election were as follows:

Candidate C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
Pairwise results (wins-ties-losses) 7-0-2 8-0-1 1-0-8 2-0-7 9-0-0 5-0-4 4-0-5 2-1-6 1-1-7 5-0-4
Lowest percentage out of all pairwise defeats 6.1% 35.3% 4.2% 6.0% N/A 9.1% 4.3% 2.2% 1.0% 12.5%

I'm assuming a pairwise tie means that when you are ranking a candidate, you rank it higher than other candidates with the same number of pairwise wins, and that where there is still ambiguity after that, you go with the candidate with the highest percentage out of all of their pairwise defeats. So going off those results, here is the ranking:

Ranking of candidates:

  1. Candidate 5 (9-0-0)
  2. Candidate 2 (8-0-1)
  3. Candidate 1 (7-0-2)
  4. Candidate 10 (5-0-4, 12.5%)
  5. Candidate 6 (5-0-4, 9.1%)
  6. Candidate 7 (4-0-5)
  7. Candidate 8 (2-1-6)
  8. Candidate 4 (2-0-7)
  9. Candidate 9 (1-1-7)
  10. Candidate 3 (1-0-8)

Is this the ranking method people are proposing? If not, please elaborate about how we are to get a rank. - Mark 02:53, 7 April 2008 (UTC)[reply]

Ranked pairs example[edit]

To use the Ranked Pairs system, instead of the Schulze method, all of the above is followed identically until you reach the following:

Armed with the results, we do pairwise comparisons of each possible combination of candidates. The percentages are the number of pairwise wins divided by the total number of ballots. The reason the percentages don't add up to 100% is because voters can be indifferent between two candidates, either by leaving them unranked, or by giving them the same preference.

  • Candidate 1 (13.1%) loses to Candidate 2 (28.1%)
  • Candidate 1 (47.2%) beats Candidate 3 (4.2%)
  • Candidate 1 (21.5%) beats Candidate 4 (18.7%)
  • Candidate 1 (6.1%) loses to Candidate 5 (10.3%)
  • Candidate 2 (28.2%) beats Candidate 3 (21.5%)
  • Candidate 2 (12.3%) beats Candidate 4 (6.0%)
  • Candidate 2 (2.3%) loses to Candidate 5 (35.3%)
  • Candidate 3 (5.1%) loses to Candidate 4 (22.3%)
  • Candidate 3 (16.0%) loses to Candidate 5 (31.7%)
  • Candidate 4 (8.9%) loses to Candidate 5 (25%)

You need to sort them by the size of the lead of the winning candidate over the losing candidate:

Pair Winner (difference in percentages)
Candidate 1 (47.2%) vs. Candidate 3 (4.2%) Candidate 1 (43%)
Candidate 2 (2.3%) vs. Candidate 5 (35.3%) Candidate 5 (33%)
Candidate 3 (5.1%) vs. Candidate 4 (22.3%) Candidate 4 (17.2%)
Candidate 4 (8.9%) vs. Candidate 5 (25%) Candidate 5 (16.1%)
Candidate 3 (16.0%) vs. Candidate 5 (31.7%) Candidate 5 (15.7%)
Candidate 1 (13.1%) vs. Candidate 2 (28.1%) Candidate 2 (15%)
Candidate 2 (28.2%) vs. Candidate 3 (21.5%) Candidate 2 (6.7%)
Candidate 2 (12.3%) vs. Candidate 4 (6.0%) Candidate 2 (6.3%)
Candidate 1 (6.1%) vs. Candidate 5 (10.3%) Candidate 5 (4.2%)
Candidate 1 (21.5%) vs. Candidate 4 (18.7%) Candidate 1 (2.8%)

We then lock in the candidates in order, from the highest difference in percentage to the lowest, skipping any pairs that contradict the order that has already been locked in.

  • Lock in Candidate 1 over Candidate 3.
  • Lock in Candidate 5 over Candidate 2.
  • Lock in Candidate 4 over Candidate 3.
  • Lock in Candidate 5 over Candidate 4.
  • Lock in Candidate 5 over Candidate 3.
  • Lock in Candidate 2 over Candidate 1.
  • Lock in Candidate 2 over Candidate 3.
  • Lock in Candidate 2 over Candidate 4.
  • Lock in Candidate 5 over Candidate 1.
  • Lock in Candidate 1 over Candidate 4.

So the ranking produced by the Ranked Pairs method in this example would be:

  1. Candidate 5
  2. Candidate 2
  3. Candidate 1
  4. Candidate 4
  5. Candidate 3

Which is the same as what I produced above. Maybe I've just done the same thing twice and don't realise it.