EVP's Track Record

How well did EVP do in the 2004 election? Let's take a look. But before looking at the numbers it is worth mentioning that pollsters and statisticians look at polls in a different way from the general public. If a poll predicts that Smith will win some state 51% to 49% with a margin of error of 3% and Jones win that state by 51% to 49% the pollster would say he got it right because Smith's score was in the range 48% to 54% just as predicted and Jones' was in the range 46% to 52%, just as predicted. On the other hand, if Smith won with 56%, the pollster would admit that he got it wrong. Many people do not understand this.

Nevertheless, to keep things easy to understand (although technically incorrect), we will examine how EVP's predictions went in each of the 51 contests tracked (50 states + DC). Three different algorthms (formulas) were used, each resulting in a different map. Initially, only algorithm 1 was used, but I saw earlier on how unstable it was, so I switched to algorithm 2, which resulted in a huge amount of mail demanding that algorithm 1 be reinstated. Bowing to popular demand, I did so. Later algorithm 3 was invented as a more sophisticated model. Maps for all three of them were produced daily toward the end of the campaign. Thus EVP made 3 x 51 = 153 predictions. The electoral college score was really just a byproduct of the 51 state results. It is important to note that all the results were produced by software doing computations on the polls. There was no human judgement involved. Anybody running the same algorithms on the same data would have come to exactly the same conclusions.

Here are the three algorithms.

Just use the most recent poll (original algorithm)
Average the past 3 days worth of nonpartisan polls
A mathematical model of how undecided voters break

The first algorithm, is the simplest. Just use the most recent poll in every state regardless of who took it. The trouble with this one is that when there are many polls, as in Ohio and Florida, there were wild, but meaningless, fluctuations from day to day. In retrospect, this was not a good choice.

The second algorithm made two changes. First it took the most recent poll and any others within three days of it and averaged them. This damped the oscillations appreciably. Second, partisan pollsters were excluded. The need for this may not be obvious to everyone initially. Basically, there are two kinds of pollsters: those who sell their polls to newspapers and TV stations and those who work for candidates (usually only one party). The former try to tell the truth and measure success by coming close to the final result. The latter want their horse to win and don't give a hoot about the truth. Most of the famous pollsters, like Gallup, Mason-Dixon, SurveyUSA, Zogby, Rasmussen, universities, etc. are in the first category. Algorithm 2 omitted the category 2 pollsters like Strategic Vision (R) and Garin-Hart-Yang (D).

Algorithm 3 got into mathematical modeling and tried to predict how the minor candidates would do and how the undecideds would vote. Historically, many people are willing to tell pollsters that they will vote for some ideologically driven minor candidate like Ralph Nader or Pat Buchanan but don't actually do out of fear of having the major party candidate they hate win. Furthermore, historical data shows clearly that the undecideds tend to break for the challenger, usually about 2:1. This algorithm assumed Nader would get 1%, Badnarik would get 1%, and the undecideds would break 2:1 for Kerry.

Here are the state-by-state results, where we have used the layman's definition of correct, that is picked the winner, without regard to the margin of error.

Algorithm	Correct states	Incorrect states	To close to call	% Correct	EVs: Kerry - Bush
Final results	51	0	0	100%	EVs: 251 - 286
Algorithm 1	46	4: FL IA NM WI	1	92%	EVs: 262 - 261
Algorithm 2	48	1: IA	2: NM WI	98%	EVs: 245 - 278
Algorithm 3	48	3: FL IA NM	0	94%	EVs: 281 - 257

Thus the predictive accuracy ranged from 92% to 98%, with the best algorithm being the one that averaged polls over the last three days and omitted the partisan pollsters. This year we are using a modification of this one: average polls over 7 days and omit the partisan pollsters.

Where did the polls go wrong? The problem states were Florida, Iowa, New Mexico, and Wisconsin, fairly consistently. In Iowa, the two candidates differed by 0.9%, in New Mexico by 1.1%, and in Wisconsin by 0.4%. Given a standard margin of error of ±3% or sometimes ±4%, these were all well inside the margin of error. The only one that was at the outer edge was Florida. The final result here was Kerry 47.1%, Bush 52.1%. The three algorithms predicted splits of 49-44, 47-48, and 52-46. The first and third were way off. The middle one (poll averaging) picked the right winner and got Kerry right on the dot, but underestimated Bush by 4%.

Back to the main page.