Not Again: Polling Error in the 2020 Presidential Election
Two years ago this month, I started working for European polling aggregator EuropeElects. After parsing hundreds of polls, countless elections, unspeakable numbers of electoral surprises and millions of votes in Europe, I am left once again staring toward America in a dazed confusion matched only by the fact it is shared by every other onlooker.
I may know my cross-tabs from my turnout weighting, but I am still left asking the same question as everyone else: how, after a pathetic performance in 2016, have American pollsters allowed themselves to do it again?
A note before I begin: I am not a) American, b) an expert on every pollster, their methodology and their respective partisan leanings, c) able to do a full statistical analysis as — at time of writing — the election is ongoing (NV, PA, AZ, NC, GA still to be called). Besides, American pollsters are pretty poor at providing tables and properly reporting their weightings of different social groups. Makes life challenging to write this kind of piece. Let me know via Twitter if I made any mistakes in the data.
Before election day night, we all presumed Trump had lost the election. Why? Well, that’s what the polling numbers and the various Electoral College projections claimed.
By electionmas eve, most projections remove economic weighting and focus more heavily on polling consensus. Even those that retain an economic factor in their projection rely heavily on a polling average (weighted by particular pollsters). What if that polling average turns out to be wrong?
To s̶p̶e̶c̶u̶l̶a̶t̶e̶ ̶w̶i̶l̶d̶l̶y̶ discuss the potential errors in polling data I’m going to quickly parse three states and the polling in the previous 48h before election day. For a quick test of state polling, I’m going to consider Florida, Texas and Michigan, and one reason why polls may have failed in each state. My attempt is to show examples of shortcomings, not the whole picture of every problem in each state, or across America.
Florida had 26 polls in the final two days before polling day. Of which, just 6 called a Trump win in a state where the final tally (96% reporting) is 51.2–47.8 in his favour. Of the six that called the race correctly, none had the incumbent as high as 51%. What we have seen in Florida is a swing toward Trump as he increased his margin of victory over Democrats in 2016 by 2pp, seeing swings in his favour across the state.
One shortcoming of polling has clearly been in predicting Latinx voting patterns. In Florida, Cuban Americans (of which there are 1.5m in the south-eastern state, 70% of the national population) turned out in massive numbers for Trump with as many as 55% of the minority group voting for the President, proving not only the group’s documented social conservatism and also Cubans’ particular antagonism toward socialism, something Trump’s campaign played on specifically (and FL Senator Rubio has also done in the past).
Furthermore, massively increased Latinx voter-registration was seen in a traditionally stubborn group when it comes to voting. Registered Hispanic American voters have increased by almost a third since 2008, something it appears pollsters have been poorer in accounting for. A combination of the language barrier in reaching Spanish first-language speakers, and ill-designed research generalisations (fundamentally both of these can be put down to ignorance practised by a majority-white polling industry) can be blamed here. Briefly, the latter problem is that pollsters tend to weight by “hispanic” as a homogenous bloc, something we are quickly learning to be problematic. Most pollsters estimated 20% of the electorate in Florida would be Hispanic, and it was 19%, but what they didn’t account for here was that a third of that vote would come from Cuban-Americans, a group who only make up a fifth of the total Hispanic population of Florida. As you can see, oversimplified racial pigeonholing can cost you greatly inaccuracy, as some analysts predicted before the election (particular h/t to the New Statesman’s Ben Walker).
The challenge in Texas for pollsters appears to have been more mixed. Nine polls were done in Texas in our sample, with two backing Biden, one even poll and the other six giving Trump the lead. None, however, gave Trump the eight-point lead which he actually achieved to win the state’s 38 electoral votes. 538’s polling average even closed on election day with just a one-point gap between the candidates, considering the race a toss-up.
Some of the numbers in Texas are really interesting, and it is incredible the degree to which Biden trailed Hillary in a number of counties. Particularly in rural border counties, Biden’s lead was just a fraction of what Hillary achieved (one county, Starr, has Biden leading by just 5 points in a state Hillary won by 60 — and that’s not a typo). So what’s behind Biden’s underperformance and why did the polls miss it? There does appear to be a large Hispanic factor here again (Starr county being a 90% Hispanic county), but let’s not rehash that.
Instead, is there a degree of polls following the narrative? The narrative around Texas was that it was turning blue, after brief chatter in 2016 that appeared to be more meaningful in 2020 after Democrat Beto O’Rourke’s unsuccessful but well-covered senate race in the state’s 2018 midterm. The key to this switch was urban voters in more liberal cities such as Houston, Austin and El Paso. The challenge in polling is that urban counties did not uniformly swing toward Biden in a significant enough way for him to win or the polls to be right. To complicate this problem, many pollsters in Texas do not appear to have weighted for the urban-suburban-rural divide in their polling estimations, something which could indicate how the state was predicted wrongly, as we saw strong swings toward Trump in rural areas and Biden basically standing still across the state’s urban districts. As states with significant rural and urban populations become more polarised, this is something that pollsters will need to account for more directly as rural Hispanics, blacks and college-educated whites, vote differently to those of the same demographic column in suburbs or city centres.
Thirdly and finally, let’s briefly consider Michigan, a state only just called for a winner at the time of writing. The 538 poll tracker has Biden ahead 8 points on election day, while the actual result is a fraction of a per cent for Biden. Of the 19 polls in question, only one pollster called even close to the actual result, the pollster Wick called the race at 48–48, as opposed to the real 49–49. I have made no mention of margin error in this piece so far, but errors across the board have been at the furthest stretch of the typical 1–5% calculated MoE that pollsters give themselves. Better statisticians than I will do a full deviation analysis to see which pollsters succeeded and failed. In Michigan in particular, some polls gave Biden a lead in the double digits — something which thusly must be a methodological problem, not simply a random incidence of the sample.
So what could it be that’s throwing up such a significant error? Well, the state was won by Trump in 2016 by a margin of 1.2%. This year looks just as close but Biden is the man ahead, thanks to narrow swings in his favour across the state (both in rural and urban areas). Some pollsters have reported confusion and difficulty in dealing with astronomical numbers of early voting ballots — which create an ontological challenge for pollsters asking voter intention when some respondents have already voted. As early voters tend to be the most engaged, they are also incredibly likely to respond to polling requests if they are contacted. These engaged voters lean heavily Democratic and voted earlier when Biden was doing better in the polling, compounding an increased likelihood for them to vote for the Vice President. An over-emphasis on people who have already voted in a poll could lead to a large skew in Biden’s favour.
It’s clear that the US polling industry has a lot to make up for. Whatsmore, the infamy of the industry failing not once but twice in major elections where the credibility of mainstream media is a key issue casts a dark and long shadow across other pollsters across Europe. There are good pollsters in the US that make good polling, much like in every European country and beyond. However, the US’ 400+ polling firms form a crowded picture with so much poor polling that even the averages can’t smooth out the kinks.
My name is Euan and I am a sometimes journalist and writer, focussing on politics and elections. I am on twitter @euanspeaks.