January 17, 2001
Eric Rasmusen, [email protected]

Was the 2000 Florida Presidential Election a Statistical Dead Heat?

A number of clever people have been saying that the 2000 Presidential Election was a "statistical dead heat"; that the vote difference between Bush and Gore, the two candidates, is statistically insignificant. After all, Bush had a lead in Florida only of some 1,000 votes out of 6,000,000 cast. So wasn't it really a tie?

No. This is a good example of where a difference may be statistically significant-- and morally significant-- even if it very small.

In economics, we make a distinction between two kinds of significance. If two numbers differ by an amount large enough to make a difference in how we view the situation, we say the difference is "economically significant". A less parochial term is "substantively significant". For example, if we knew that exactly 1,000 more Floridians prefer Coke than prefer Pepsi, we would say the difference is not substantively significant. This kind of significance is subjective, but in extreme cases like this everyone would agree that the difference is not significant.

The second kind of significance is "statistical significance". Suppose we had estimated the preferences for Coke and Pepsi using a sample of just 500 Floridians, and our estimate for the entire population based on that sample was that 1,000 more Floridians preferred Coke. Moreover, we believe that if the actual preference split were exactly even, then if we drew 100 such 500-person samples, then in at least 10 of those samples we would be deceived into thinking that 1,000 Floridians preferred Coke. The difference we actually did find in the one sample we took is then "not statistically significant at the 10 percent confidence level". More generally, a statistically significant difference is one that would be unlikely to arise by accident if we repeated our estimation procedure.

It follows that if our estimation procedure is very accurate, even a substantively insignificant result may be statistically significant. If we really did know the preference of every person in Florida, and 1,000 more of them prefer Coke, then the difference is statistically significant, even though it is substantively insignificant.

Substantive significance, however, depends on the context. Suppose, 227 U.S. Representatives vote "Aye" on a bill and 226 vote "Nay", a tiny difference of less than half of one percent. The difference is nonetheless significant in both senses. It is substantively significant because our rules say that a bill needs only a majority to pass. It is statistically significant because the congressional voting machine rarely counts votes wrong, so if the "Nays" really win, it is highly unlikely for it to measure the "Ayes" having won.

There is, indeed, one ambiguity in defining substantive significance. Is a difference significant if it is large enough to matter, but our estimation procedure is so inaccurate that we have no confidence in it-- that is, the difference is big but is statistically insignificant? Suppose, for example, that we estimate from a sample of 3 Floridians that *all* Floridians prefer Coke, because all 3 people in our sample do. This is a big difference between Coke and Pepsi, but it is statistically insignificant. I don't know whether we want to make our definition to say that the result is substantively signficant or not, but that will not matter for the present discussion.

Let us now return to the Florida election. Suppose we had a perfectly accurate count of all the votes. Every single qualified person was registered to vote, had been certain of their preference for two years before the election, and did vote, without human or machine error, so statistical significance is not in doubt. In that case, if Bush were ahead by even 1 vote that would be substantively significant. By our rules and norms, a bare majority is enough to make a winner in an election.

Also, there may be inaccuracy in the transition from the Will of the People to the vote count. It might be that most people in Florida actually favored Gore, but some were confused and voted for two candidates, or some were tired and didn't vote at all, or there was fraud that helped Bush but no corresponding fraud favoring Gore. In that case, our electoral system is not just inaccurate, but biased in the statistical sense of yielding a wrong estimate regardless of how many times it might be repeated and how little variation there would be between repetitions. But that is not a problem of the winning margin being too small to be significant; even if Bush's margin were large and accurately counted, Gore would have cause for complaint. But this is not what the serious disputes in Florida were about.

The issue in Florida therefore comes down to statistical significance. We are not absolutely sure that Bush was the preference of a majority of qualified voters who went to the polls and voted properly-- where "qualified" and "properly" may be defined as you, the reader, wish. We do not have a perfect count; we have an estimate, even though that estimate is based on not just a sample of voters but on the entire population that went to the polls. Everyone must admit that the count is not perfectly accurate. Some felons voted without being caught even though they were not eligible, for example, and some ballots were likely not read by the machines even though nobody could take exception to how they were filled out.

Under one view, statistical significance is unimportant for an election result. Suppose that 40 percent of the time in elections where one candidate is ahead by half of one percent, the other candidate would win if we did a recount. The voters are about evenly split anyway, so why go to the trouble of a recount? Isn't it fair to both sides if everyone agrees in advance to accept the inaccurate procedure? The problem is perhaps that inaccurate procedures are also more manipulable. They may involve more subjectivity, or there is enough noise in the process to cover up fraud. If a machine reads ballots differently every time, it is hard to catch someone who fiddles with the ballots on particular count.

How, then, would we decide whether Bush being ahead by 1,000 out of 6,000,000 votes was statistically significant? The standard method applies: ask ourselves whether if Gore were the true winner and we repeated the estimation procedure, we would accidentally find Bush ahead by 1,000 votes in more than 10 percent of the repetitions.

What is the estimation procedure? That depends somewhat on your belief about what the law is.

One possibility is that the estimation procedure is to do a machine count of the ballots and then do a machine recount if the vote is close to 50-50. This was actually done in Florida, as a first step, and it did reduce Bush's lead by hundreds of votes. This gives us some idea of the inaccuracy of the first machine count. Quite rationally, it was accurate only to within a margin near 1 in 6000. Rationally, because except on rare occasions such as this one nobody wants to spend the time to get a more accurate machine count. Particularly with perforated punchcard ballots, the machine count depends on whether "hanging chads", little paper holes that didn't punch all the way through, are brushed off or not. By the second run through the counting machines, more hanging chads will have brushed off, and on third and fourth runs, we may expect little change. This suggests that repeating the machine recounts-- as apparently was done in some counties-- would not result in a change of anywhere near 1,000 extra votes for Gore after the first recount. If that is true, then under this estimation procedure, Bush's 1,000-vote margin was statistically significant.

A second possibility is that the law requires all counties to do not only a machine recount but a manual recount under some particular standard. In this case, we cannot really say whether Bush's 1,000-vote margin was statistically significant, because it was not the result of the estimation procedure we care about. Rather, it is just an intermediate result on the way to the true count, which was never performed. The question of whether Bush's lead was significant is premature-- the real problem is that we never did a legal count. Once we did it- say, by having a panel of judges, journalists, or randomly drawn citizens do the counting under well-defined standards-- then we would ask whether the resulting margin would change if the procedure were repeated.

A third possibility is that the law requires something like the procedure that was actually followed. After two machine counts, candidates could ask county governments of their choice to grant them manual recounts. The county governments could grant the recount if they wanted, and partisan officials could perform them using whatever standards they wished, subject to the possibility of later judicial review for undue bias or fraud. The Florida Secretary of State might or might not have authority to waive the statutory deadline, but in any case there would be some deadline at which the recounts and court challenges would have to be finalized.

We could correctly say the outcome of Bush being ahead by 1,000 votes was a statistical dead heat if repeating this procedure 100 times would put Bush ahead by that much in 10 of the repetitions even if Gore were truly the winner. What does "repeating this procedure" mean? There are a number of possibilities, but it is hard to see how any of them would result in that much statistical error.

The easiest repetition scenario is that the counties recount the ballots using the same general standards for whether to have a recount, for hanging chads, and for dimples that they actually did in November 2000 and the same or similar people doing the counting. The resulting numbers would be different from what they were in November 2000 because of fatigue, personal idiosyncrasy, and general human frailty. But I doubt anybody thinks that Bush's lead would diminish by even as much as 500 no matter how many repetitions of this kind were done.

Another repetition scenario is that the counties recount the ballots, but with possibly different decisions by the campaigns and the county governments about what recounts to request and grant, what counting standards to use, and what legal actions to take. The variance between repetitions would be greater in this scenario. But would this result in Bush having fewer votes in more of the repetitions? What actually happened in November 2000 was not all that far from the best possible counting procedure for Gore. The only manual recounts were in counties in which he had many voters and his party controlled the county government. Aside from the quite general U.S. Supreme Court decision that halted the process, no legal challenges to the decisions of those county governments about which ballots to count, which overseas absentee ballots to disqualify, or which votes were fraudulent because cast by unregistered voters or felons affected the final result. The one improvement Gore could wish for was that Miami-Dade and Palm Beach County use the Broward County standards and implementation of those standards for counting ballots. But even that would help only if the Broward methods survived judicial scrutiny of individual decisions-- that is, if no judge struck down the Broward count as being too partisan-- and if Republican counties did not take the same approach.

Thus, I see little evidence that the procedure for counting votes in Florida--in whatever way we might define that procedure-- was so inaccurate that Gore would be likely to win if it were repeated. And that is what is meant by statistical significance.


Back to Rasmusen's Florida Election Page, http://Php.indiana.edu/~erasmuse/elections/rasmusen.htm.
URL: Php.indiana.edu/~erasmuse/elections/deadheat.htm. Indiana University, Department of Business Economics and Public Policy, Kelley School of Business , BU 456, 1309 East Tenth Street, Bloomington, Indiana 47405-1701, (812)855-9219. 2000-2001: Olin Senior Research Fellow, Harvard Law School, (617) 496-4878. Comments: [email protected].