Polling error in 2020, while within the historical range of polling error, has caused a lot of heartburn among Democratic strategists, pollsters, and data analysts. We're still at work understanding the 2020 election's polling misses, but one theory that looks accurate is that the COVID-19 pandemic caused polls to overstate Democratic vote share. For a state like Wisconsin with a heavy COVID outbreak near the election, the added COVID error to polls was as much as 7 points.
Simply put, people more worried about COVID were more likely to take our polls, and those people were also more likely to vote for Democrats regardless of their age, education, race, gender, or party identification.
Early in the pandemic, pollsters saw that people stuck at home were picking up the phone more
- and this analysis shows people who took the pandemic seriously kept that up through election day. Here we show that this had a direct impact on the polling results. In particular, how bad a state's mid-October outbreak in a state pretty accurately predicted how much the polls erred in Democrat's direction in that state (we used Biden support as the metric, given that race was the most polled, but we're confident this affected candidates up and down the ticket).
What we did
To analyze the link between COVID-19 and polling misses, we:
- Calculated the polling error in each of the 50 states as the difference between FiveThirtyEight's final polling average for that state and its final election result. (Source: FiveThirtyEight, The New York Times)
- To estimate the severity of a state's outbreak, we took the daily increase in positive cases two weeks before the election (per 100,000 residents) and the daily increase in positive deaths two days before the election (also per 100,000). We used rolling seven-day averages to minimize noise in COVID-19 data collection. (Source: The COVID Tracking Project)
We then plotted deaths and cases per person in a state on one axis and polling error on another.
What we found
Visually, the story is clear-the downward line means that the worse the outbreak, the more polls underestimated Trump's support.
These relationships are strong and statistically significant (R2 of 0.3895 and 0.3352 respectively for stats folks, for each of these variables individually, and well, well past the 95% confidence level). This means the number of confirmed COVID-19 cases two weeks before the election predicted nearly 40% of the error in a state's polling. It also means that for every additional one in 17,000 people with a confirmed coronavirus case in a state, the polls were off an additional percentage point. Or, to look at it another way, if there were a COVID-free state where nobody was worried about catching COVID, there would have been only two points of polling error.
This is pretty a striking relationship, especially since both of these datasets are noisy and error-prone themselves. COVID reporting can vary in quality and method by state, we're missing positive cases in many states due to lack of testing, and polling is subject to a margin of sampling error and other forms of error.
To throw out potential outliers, we narrowed this down to just the 10 most-polled states. Again, the trend is visually striking, with polling error and COVID-19 outbreaks running closely together:
The Midwest is instructive, particularly Minnesota. It was a state with a very low prevalence of COVID-19 in mid-October-it has a Democratic Governor who was not hamstrung by the courts-but aside from that, it's demographically similar enough to its neighbors of North Dakota, South Dakota, Iowa, and Wisconsin (heavily white and majority non-college). However, the polling error was remarkably different in these states, in part because Minnesota had the lowest COVID-19 rates of all five:
One thing we had to untangle is a state's COVID rate and its Trump support, because those two are intimately intertwined (states with a higher Trump vote share had more COVID in mid-October). When we put Trump's vote share and COVID rates into a regression analysis to separate them, it's clear both drove polling error: polls were more off in states where Trump did well (quite possibly due to turnout dynamics), and they also were more off in states where COVID was spreading most, even accounting for Trump support.
This helps explain why the polling miss looked so different this year than 2016, in two key ways:
- The national polls were off this year while they weren't in 2016-this year polling was too pro-Democratic virtually everywhere, while 2016 polling overestimated Trump's vote share in states like Texas and Arizona while understating it in Wisconsin, Iowa, Pennsylvania, and Michigan.
2. Of those four states where the 2016 polling was such a big miss, polling got:
- A bit better in key swing states Michigan and Pennsylvania from 2016 to 2020, two states that were relatively low in COVID-19 in mid-October
- No better in Wisconsin or Iowa, two higher-coronavirus states.
One thing that heavily drove the 2016 error was the swing from Obama to Trump in a state: a state's Trumpiness, beyond its Republican-ness, made polls more wrong. Trump's support did drive polling error this time too, some. While we can't yet analyze voter files, we suspect turnout assumptions were another source of error in polls. However, this just wasn't the dominant factor it was in 2016. Unlike in 2016, neither the swing from Clinton to Trump (2016 to 2020) nor from Romney to Trump (2012 to 2020) drove error all that much. For example, when we put Trump support and COVID-19 rates in a regression model, we found that a 15-point increase in Trump's vote share from state to state added only one point to the polling error.
What this means
The pandemic didn't cause 100% of polling error in 2020. But it is clear that Trump outperformed his polls the most in places with bad outbreaks. He also ran ahead in the polls the most in states when the outbreaks were at their peaks. We take a few things from this at the same time:
- Getting partisanship right alone can't save polls. In many cases, we were making sure our polls were weighted to partisan measures like party registration and of people saying they voted for Trump in 2016, thinking we were accounting for samples where we saw too many Democrats. But in fact, we know people who were more worried about COVID-regardless of partisanship or past vote-were more likely to vote for Democrats, more supportive of COVID restrictions, and less approving of the job Trump was doing on the pandemic.
- Some of the error was a (please God) one-time issue. It is our sincere, sincere hope we never get another election where we can test this theory, that a pandemic ripping through our country that one party cares about more reliably causes a miss in polling. There was also no realistic way to be sure this would be a problem pre-election or the scope of it.
- This doesn't mean pollsters are off the hook. Part of what let COVID-19 make polls off is that response rates to polls are so low to begin with. If only 1 in 100 people are answering the phone, a 1-point increase in Democratic vs Republican response rates (or COVID-fearing vs non-COVID-fearing rates) can massively impact polls at a level that just can't happen when most Americans were taking our polls by rule like was the case half a century ago. Even as we root out additional sources of 2020 polling error, we're going to need to keep thinking about ways to deal with declining response rates. This may include different modes like SMS, online, and phone to spread out the risk of one mode leading to more non-response, though our analysis of internal data and public polling doesn't show any one mode doing better than another.
- We're all going to have to rely on polls as guides, not crystal balls. The angst caused by polling error shows how reliant we've become on polls in politics. Polling drives campaign spending decisions, it shifts donations across Senate races, and it made Nate Silver famous. We're obviously believers in polls, in particular well-crafted research that helps campaigns plan their strategy. But at the same time, even as pollsters we'd admit that using polling as the only metric to drive campaign strategy and resource allocation doesn't make sense. We don't know what will cause the poll error next time it's an issue, but we know it's coming-in both 2000 and 2012, the national polls were more off than in 2020. For campaigns in particular, we need to better price in the fact that every poll can be wrong in the same direction and devise our strategy with that in mind.