The Spatial Borda Count

Colin Champion, 7 Mar 2023

the euclidean borda count : the spatial borda count : software : index

The Borda count is roughly optimal under a jury model but much weaker under a spatial model, even given favourable assumptions of sincere voting and unbiased fields of candidates. However there is no reason to suppose that the uniform scale of weights should be optimal, so there is some interest in seeing whether it can be improved.

An obvious line of argument is this: we can compute the average distance from voters to the candidates coming – say – third in their rankings (voters being assumed female, candidates male). Likewise for the other positions. And we may then estimate the average distance of a candidate from the electorate by averaging the average distances corresponding to his positions in their ballots. This gives a score under which the candidate with lowest score should be best;– or we can renormalise it to obtain a statistic which we expect to be maximised by the best candidate. This gives what I shall call the “Euclidean Borda Count”.

The program euclideanborda below finds Euclidean weights under a univariate or bivariate Gaussian model of voters and candidates. It gives the following values for 10 candidates (running from top to bottom of a ballot):

9 8.22 7.49 6.76 6.01 5.23 4.37 3.36 2.08 0 // dim=1
9 7.85 6.97 6.18 5.44 4.68 3.88 2.97 1.83 0 // dim=2

The coefficients for the univariate and bivariate Euclidean Borda counts are illustrated in the graph (as ebc₁ and ebc₂) against the standard Borda count, the Nauru alternative, and a couple of additional formulae. The coefficients are scaled for comparability, drawn with the weight for the bottom-ranked candidate at the left and for the top-ranked candidate at the right.

We can add the Euclidean Borda Count to our standard evaluation framework (see Simple Voting Methods). The extra subroutine needed is provided below. The percentages correct for each method are as follows:
Nauru Borda ebc₁ ebc₂
Univariate model 34·9 74·0 64·9 74·0
Bivariate model 56·4 85·4 83·2 87·7

These results are perplexing. The bivariate weights outperform the standard Borda count in two dimensions as we’d expect, but the univariate weights give worse results than the standard Borda count in a single dimension; though even here, mysteriously, the bivariate weights give an improvement, albeit not one which is statistically significant. (The Borda accuracy is 74·012% to full precision while ebc₂ is 74·013%.)

We can try to do better by explicitly optimising weights for performance. For any postulated set of weights, we can generate a set of simulated elections and measure – say – how often a candidate A who is really better than B outscores B for the given weights; and we may hillclimb on this statistic. The voting method using the implied weights will be termed the “Spatial Borda Count”. (Alternatively we could measure how often the best candidate outscores each of his rivals, or how often he outscores them all together. As we modify the objective function in this way we bring it closer to its intended use, but reduce the amount of information afforded by each simulated election.)

If we want to express our objective function as a mathematical formula, we can write it as

∑ sgn(d_ki–d_kj) × sgn(scr_ki–scr_kj)

where d_ki is the distance from the origin of the i^th candidate in election k and scr_ki is the candidate’s score (the voter distribution being centred on the origin). The signum function takes the value 1 for postive arguments, –1 for negative arguments, and 0 if its argument is 0.

This simplifies as

∑ sgn((d_ki–d_kj) × (scr_ki–scr_kj)).

If we now adopt the approximation sgn(x) ≈ x, then our task is to maximise

∑ (d_ki–d_kj) × (scr_ki–scr_kj),

in other words, to maximise the correlation between the differences in scores and the differences in distances; and obviously this is achieved by making the scores proportional to the distances. Hence the brute-force weights have the EBC as a natural approximation.

Coefficients for the spatial Borda count for 1 and 2 dimensions are show in the graph above.

We can now extend the previous table to include results for the new sets of coefficients:
Nauru Borda ebc₁ ebc₂ sbc₁ sbc₂
Univariate model 34·9 74·0 64·9 74·0 80·3 80·1
Bivariate model 56·4 85·4 83·2 87·7 87·9 91·2

These results are much as we would expect: sbc₁ is found to be the best positional method for univariate Gaussians and sbc₂ to be the best in 2 dimensions.

euclideanborda.c : spatialborda.c : sbc.c : download

The following program computes the average distance of a voter from the candidate she ranks 1st, 2nd, etc. in an election between m candidates. Voters and candidates are drawn from a standard univariate or bivariate Gaussian. We can normalise the average distances to use as weights for a Borda-like voting method.

If we run euclideanborda 10 (the argument being the number of candidates) then we get the output

dim=1, cond=1.0; neval=1026
by level: 2 4 8 10 10 20 34 48 80 98 112 36 16 10
 0.2151  0.4046  0.5839  0.7614  0.9427  1.1344  1.3447  1.5891  1.9015  2.4085 | avge=1.1286 (should be 2/sqrt(pi)=1.1284)
9 8.22 7.49 6.76 6.01 5.23 4.36 3.36 2.08 0 
dim=2, cond=1.0; neval=1121
by level: 2 4 8 10 10 20 38 58 84 122 160 18
 0.6528  0.9655  1.2056  1.4180  1.6220  1.8271  2.0445  2.2916  2.6022  3.0987 | avge=1.7728 (should be sqrt(pi)=1.7725)
9 7.85 6.97 6.18 5.43 4.68 3.88 2.97 1.83 0

This gives sets of weights for univariate then bivariate voter distributions. The first two lines tell us that 1026 function evaluations were performed for a univariate distribution, and show how the integrations over regions break down by recursive level. The overall average distance is computed as 1.1284 (equal to its theoretical value of 2/√π).

The next line says that a voter’s first choice candidate is on average at a distance of 0.2152 from her; her second choice is at 0.4045; and so forth. If we wanted to transform these weights for a Borda count in which the highest score was best, we could assign 9 points to a voter’s top candidate, 8.22 to her second, and so forth.

Then we get similar information for bivariate gaussians.

• gausscirc • f • main

#include "memory.h"
#include <math.h>
#include "quadrate.h"
double besseli(int n,double x),rtlnorm(double x),ltlnorm(double x) ;
double gaussdisc(double dd,double rr),gaussexdisc(double dd,double rr) ;
static double gausscirc(double d,double r) 
{ return r * exp(-(d*d+r*r)/2) * besseli(0,r*d) ; }

static int m,dim ;

// For a voter at distance d from the origin, and given a radius r from the 
// voter, multiply together
// i.   The probability of a voter V having that position;
// ii.  The probability of a candidate C then being at distance r from V;  
// iii. For each position t in V's ballot, the probability that V will place t  
//      candidates higher than C and m-t-1 lower; 
// iv.  And r and m themselves.

static void f(double *x,double *y)
{ double u=x[0],v=x[1],d,r,p,q,pq,bico,bicon,bicod,qk,qcirc,qd ; 
  static double q2pi = 1/sqrt(2*3.141592653589793) ;
  int t ; 
  if(v<=0||v>=1||u>=1) { for(t=0;t<m;t++) y[t] = 0 ; return ; }
  d = 10 * atanh(u) ; 
  r = 10 * atanh(v) ;
  if(dim==2)
  { qd = d * exp(-d*d/2) ;                                           // i
    qcirc = gausscirc(d,r) ;                                         // ii
    if(r>1+1/(2*d+1)) p = 1 - ( q = gaussexdisc(d,r) ) ;
    else q = 1 - ( p = gaussdisc(d,r) ) ;
  }
  else 
  { qd = 2 * q2pi * exp(-d*d/2) ;                                    // i
    qcirc = q2pi * ( exp(-(d-r)*(d-r)/2) + exp(-(d+r)*(d+r)/2) ) ;   // ii
    if(r>1+1/(2*d+1)) p = 1 - ( q = rtlnorm(d+r) + ltlnorm(d-r) ) ;
    else q = 1 - ( p = rtlnorm(d-r) - rtlnorm(d+r) ) ;
  }
  pq = p / q ;
  qk = (r*m) * qd * qcirc * 100 / ( (1-u*u) * (1-v*v) ) ;
  y[0] = qk * pow(q,m-1) ;
  for(t=1;t<m;t++) y[t] = y[t-1] * pq ; 
  for(bicon=m-1,bicod=1,bico=t=1;t<m;t++,bicon--,bicod++)
  { bico *= bicon/bicod ; y[t] *= bico ; }
}
int main(int argc,char **argv)
{ int t,i ;
  if(argc>=2) m = atoi(argv[1]) ; else m = 9 ; 
  double q,*y=vector(m),tol,pi=3.141592653589793 ;
  if(argc>=3) tol = pow(0.1,atof(argv[2])) ; else tol = 0.01 ; 

  for(dim=1;dim<=2;dim++)
  { quadratereport r = cubate(f,2,m,y,tol,100000) ;
    printf("dim=%d, cond=%.1f; neval=%d\nby level:",dim,r.condition,r.neval) ;
    for(i=0;i<r.nleveval;i++) printf(" %d",r.leveval[i]) ; 
    printf("\n") ; 

    for(q=t=0;t<m;t++) { q += y[t] ; printf("%7.4f ",y[t]) ; }
    if(dim==2) printf("| avge=%.4f (should be sqrt(pi)=%.4f)\n",q/m,sqrt(pi)) ; 
    else printf("| avge=%.4f (should be 2/sqrt(pi)=%.4f)\n",q/m,2/sqrt(pi)) ; 
    q = (m-1) / (y[m-1]-y[0]) ; 
    for(t=0;t<m;t++) y[t] = q * ( y[m-1] - y[t] ) ; 
    printf("%d ",m-1) ; 
    for(t=1;t<m-1;t++) printf("%.2f ",y[t]) ; 
    printf("0 \n") ; 
  }
}

spatialborda finds optimal weights for a spatial positional method. It simulates a lot of elections, storing a matrix of poscounts for each election. The poscounts record how often the best candidate occurred in each position of a ballot, how often the second best occurred in each position, and so forth. This is the only information needed from each simulated election, so having stored it we can access it repeatedly without needing to redo the simulations.

Then we hillclimb on the coefficients of the voting method, computing an objective function which is easily derived from the poscounts.

The call is

spatialborda dim m n ne

where dim is the dimension of the space, m is the number of candidates, n is the number of voters, and ne is the number of elections to simulate. My own weights for 10 candidates in 1 and 2 dimensions came from 30000 voters and 1000000 elections.

• effectiveness • main

#include "memory.h"
#include <math.h>
#include "random.h"

static int m,n,ne,***p,ncall=0 ;
static double qmax=0 ; 
double uoamax(double (*func)(double*),double *x,int n,
                 double xinc,double tol,int max_iter) ;
int uoacond() ; 

static double effectiveness(double *x)
{ int i,j,t,nwin,eno ; 
  static int prevprint=0 ; 
  double q,w[100],scr[100],qk=2/(m*(m-1.0)) ;
  for(t=0;t<m-2;t++) w[t+1] = x[t] ; 
  w[0] = m-1 ; 
  w[m-1] = 0 ; 
  ncall += 1 ; 

  for(q=eno=0;eno<ne;eno++,q+=qk*nwin)
  { // p[eno][i][j] is the number of ballots in election eno in which the
    // ith best candidate was ranked jth
    for(i=0;i<m;i++) for(scr[i]=0,j=0;j<m;j++) scr[i] += w[j] * p[eno][i][j] ;
    for(nwin=i=0;i<m-1;i++) for(j=i+1;j<m;j++) if(scr[i]>scr[j]) nwin += 1 ; 
  }
  if(q>qmax||ncall==prevprint+10)
  { prevprint = ncall ; 
    qmax = q ; 
    printf("%-4d: ",ncall) ; 
    for(i=0;i<m;i++) printf("%.3f ",w[i]) ; 
    printf("-> %.6f\n",100*q/ne) ; 
  }
  return q/ne ;
}

int main(int argc,char **argv)
{ int dim = (argc>=2?atoi(argv[1]):2) ; 
  m = (argc>=3?atoi(argv[2]):10) ; 
  n = (argc>=4?atoi(argv[3]):10000) ; 
  ne = (argc>=5?atoi(argv[4]):100000) ; 
  p = imatrix(ne,m,m) ;

  int t,eno,i,vno,cond,cno ; 
  double q,*x=vector(m),**c=matrix(m,dim),**cc=matrix(m,dim),*v=vector(m) ; 
  double **pp=matrix(m,m) ;
  xi *cdist=xivector(m) ; 

  // compute poscounts for simulated elections
  for(eno=0;eno<ne;eno++)
  { for(i=0;i<m;i++) for(cdist[i]=xi(0,i),t=0;t<dim;t++) 
    { cc[i][t] = q = gaussv() ; cdist[i].x += q*q ; }
    xisort(cdist,m) ; 
    for(i=0;i<m;i++) for(cno=cdist[i].i,t=0;t<dim;t++) c[i][t] = cc[cno][t] ;

    for(vno=0;vno<n;vno++)
    { for(t=0;t<dim;t++) v[t] = gaussv() ; 
      for(i=0;i<m;i++) for(cdist[i]=xi(0,i),t=0;t<dim;t++) 
      { q = c[i][t] - v[t] ; cdist[i].x += q*q ; }
      xisort(cdist,m) ; 
      for(i=0;i<m;i++) p[eno][cdist[i].i][i] += 1 ; 
      // p[eno][i][j] is the number of ballots in election eno in which the
      // ith best candidate was ranked jth
    }
    for(i=0;i<m;i++) for(t=0;t<m;t++) pp[i][t] += p[eno][i][t] ;
    if((eno+1)%100000==0||eno+1==ne) 
    { printf("%d elections:\n",eno+1) ; 
      for(q=100.0/(n*(1.0+eno)),i=0;i<m;i++) 
      { for(t=0;t<m;t++) printf("%8.3f",pp[i][t]*q) ; printf("\n") ; }
      printf("\n") ; 
    }
  }

  for(i=0;i<m-2;i++) x[i] = m-2-i ; 
  q = uoamax(effectiveness,x,m-2,0.05,0.0001,10000) ; 
  cond = uoacond() ; 
  printf("%.3f ",m-1.0) ; 
  for(i=0;i<m-2;i++) printf("%.3f ",x[i]) ; 
  printf("0.000 -> %.6f",100*q) ; 
  if(cond) printf("** cond=%d**\n",cond) ; else printf("\n") ; 
}

This subroutine can be compiled in with condorcet, my simulation program, in the normal way. It implements the univariate and bivariate Euclidean and Spatial Borda counts for 10 candidates.

• ebc • ebc1 • ebc2 • sbc1 • sbc2

#include "condorcet.h"
static int ebc(int m,double **p,double *w) 
{ int i,j,imax ; 
  if(m!=10) throw up("m=%d not permitted for euclidean Borda count",m) ; 
  double q,v ; 
  for(v=i=0;i<m;i++) 
  { for(q=j=0;j<m;j++) q += p[i][j] * w[j] ;
    if(i==0||q>v) { imax = i ; v = q ; } 
  }
  return imax ;   
}
int ebc1(int m,double **p)
{ static double w1[10]={9,8.22,7.49,6.76,6.01,5.23,4.37,3.36,2.08,0} ;
  return ebc(m,p,w1) ; 
} 
int ebc2(int m,double **p)
{ static double w2[10]={9,7.85,6.97,6.18,5.44,4.68,3.88,2.97,1.83,0} ;
  return ebc(m,p,w2) ; 
} 
int sbc1(int m,double **p)
{ static double w[10]={9,7.88,6.68,5.46,4.28,3.20,2.23,1.40,0.67,0} ;
  return ebc(m,p,w) ; 
} 
int sbc2(int m,double **p)
{ static double w[10]={9,7.65,6.38,5.27,4.28,3.39,2.58,1.81,0.96,0} ;
  return ebc(m,p,w) ; 
} 
static int unused = notify(ebc1,"ebc1","") + notify(ebc2,"ebc2","") + 
                    notify(sbc1,"sbc1","") + notify(sbc2,"sbc2","") ;

euclideanborda.c spatialborda.c sbc.c
utils: memory.h random.h infheap.h quadrate.h ranf.c besseli.c 
       quadrate.c cubate.c rtlnorm.c log1plus.c lu.c
gaussint: gaussint.c 
hillclimbing: newuoa.c mdplib.c linmax.c

The following call lines compile the programs using g++ as compiler:

g++ -o euclideanborda -g -O euclideanborda.c cubate.c quadrate.c gaussint.c rtlnorm.c log1plus.c besseli.c linmax.c lu.c
g++ -o spatialborda -g -O spatialborda.c newuoa.c mdplib.c ranf.c

euclideanborda.c : spatialborda.c : sbc.c : download

	Nauru	Borda	ebc₁	ebc₂
Univariate model	34·9	74·0	64·9	74·0
Bivariate model	56·4	85·4	83·2	87·7