Analysis of Venezuela's referendum counts

By Jonathan Taylor

I was asked by Dr. Jennifer McCoy of the Carter Center, to look at the machine by machine counts of the recall election in Venezuela, specifically to see if there was any evidence of the opposition's claim of election fraud. The main issue I was asked to look at was the number of ties both for SI and NO at each mesa and determine if there were an excessive number of ties for SI.

In my analysis, I looked at several models for the count data using R, among them:

A model based on the empirical SI votes across all machines. R code
A Poisson model with one parameter shared by all machines. R code
A Poisson model with a parameter varying by mesa. R code
A multinomial model where the SI votes are redistributed across all machines within a mesa (p=(1/3,1/3,1/3) if there were three machines, p=(1/2,1/2) if there were two machines). R code
A parametric bootstrap model for the counts in each machine within each mesa, having conditionally independent Binomial counts within each machine with probability of success the observed proportion of SI votes within the mesa. R code
Another parametric bootstrap model, where the residuals were resampled assuming they were (conditionally) independent (given the counts). This was a mistake (see below). R code

Of these, the first two models are clearly unrealistic but were used for comparison. The next three are more realistic and give roughly the same results (see below). They also agree fairly well with simulation results of Avi Rubin at Johns Hopkins University. The multinomial model (the model with p=(1/3,1/3,1/3)) has already been analyzed by a statistician, Ellio Valladares, at the University of Virginia. The final model has a mistake in it, and, unfortunately, was the one whose results were reported in The Economist.

I have provided R code for the analyses, though some are slow to run, particularly the multinomial model. No real attempt has been made to write them more efficiently, rather they were written with the aim of being easy to read. To rerun the analyses in R, simply use "source", i.e. to rerun the parametric bootstrap model (R code) type the following at the R prompt

source('http://www-stat.stanford.edu/~jtaylo/venezeula/normal-resid-model.R')

Correction to the results in The Economist

There was an error in the figures quoted by the Economist in an article written by Dr. McCoy. The figures were based on the above parametric bootstrap model, and the error was based on a mistake on my part.

Specifically, I fit a multivariate normal to the scaled residuals between the number of votes in a given machine and the total number of votes in the mesa (scaled by the square root of the total number of votes in each mesa). Unfortunately, in my first models, I made the significant error by ignoring the multivariate aspect of the residuals and generated uncorrelated residuals for the parametric bootstrap. Because these residuals should be negatively correlated, ignoring the correlation had the effect of making the simulated totals in each machine have less separation then they should have. This led to an inflated number of expected ties, hence the figure of 380, as quoted in The Economist is too high. Model 6. above is not exactly the same model that the 380 was based on, but is very similar: the model whose results are reported here is a parametric bootstrap (assuming the residuals were normally distributed given the total); the model reported in the Economist was based on a non-parametric bootstrap model.

Results for SI

Model	E(Ties)	SD(Ties)	Z
1.	58	8	43
2.	320	18	4.6
3.	344	19	3.1
4.	348	19	2.9
5.	346	19	3.0
6.	377	19	1.3

Standard errors above are based on a Poisson approximation, as the number of ties is (under all of these models) the sum of many independent, rare counts. It is likely a slight underestimate of the true standard error.

Results for NO

Model	E(Ties)	SD(Ties)	Z
1.	55	7	-34.5
2.	273	17	-2.3
3.	290	17	-1.2
4.	294	17	-1.0
5.	290	17	-1.2
6.	334	18	1.2

Summary

It seems that an expected number of ties between 345 and 350 is reasonable, as it came out from many different models. Using the Poisson assumption to estimate the standard error, it seems then that the probability of observing 402 or more ties for SI is between 1 and 3 in 1000. While this probability is small, I do not feel that it should be interpreted as overwhelming evidence of fraud.

home | Archive | analysis | videos | data | weblog