Why the Polls Were Wrong


Nov 14, 2016

The cover of the New York Post newspaper is seen with other papers at a newsstand in New York, November 9, 2016

Photo by Shannon Stapleton/Reuters

This commentary originally appeared on U.S. News & World Report on November 10, 2016.

The highly polarized 2016 presidential election has finally ended, with president-elect Donald Trump winning in the electoral college, but with Democratic nominee Hillary Clinton maintaining a slight edge in the popular vote. Most polls, including our own Presidential Election Panel Survey, had Clinton leading the popular vote, but by larger margins than actually materialized. Poll aggregators like RealClearPolitics are intended to smooth out the variation among polls of different samples, approaches and limitations, making them more accurate than any single poll. They too missed the mark.

The most obvious way the polls misjudged the election is that people who turned out to vote looked very different from voters in other recent elections. Most polls rely on data gathered from people identified by a variety of factors as “likely voters.” If the actual voters this year look substantially different than in previous years — not entirely unreasonable given how unusual this election has been — then these models will do poorly. If there are high levels of undercoverage where not all people are accurately represented in survey samples, or if there are particular kinds of systematic nonresponse to poll questions, this effectively misrepresents the electorate as well.

The RAND survey is not like most polls. Rather than relying on a likely voter model, we used a probabilistic poll approach that allows respondents to report their own probability (or percent chance) of voting and of voting for a particular candidate. The rationale behind this approach was to capture data from a greater number of people who are not closely aligned with either candidate — which this year represented an unprecedented number of the electorate.

Our prediction of the popular vote — a Clinton advantage of 8.7 points — suggests that peoples' self-reported probability of voting is either highly volatile (our final survey was primarily based on data immediately following the final debate, several weeks before the vote and largely before FBI Director James Comey's letter regarding the bureau's investigation into Hillary Clinton), and/or they were simply not strong predictors of turnout this year.

It may also be, as the polling industry is increasingly aware, that sampling errors can be corrected for observable demographic factors, but that unobserved beliefs and attitudes may systematically be missed by pollsters if these people are less likely to participate in surveys with no clear way to adjust for them. An argument has also been made that Trump supporters were reluctant to report their support for Trump to pollsters, but we believed that this represented less of an issue for our survey — an online survey where respondents do not interact with interviewers — than in live telephone polls. Early in the election season we observed relatively higher support for Trump than many other polls, supporting this notion, but this also must not be ruled out.

While our survey, like the other polls, overpredicted the popular vote, that doesn't mean the data it accumulated is flawed. We did not design the survey with horse-race polling for the electoral college as the primary goal, focusing instead on the decision-making process and how that translated into behaviors. This has yielded a wealth of data that could provide deep insights into what happened and how it took the pollsters by surprise.

The 2016 Presidential Election Panel Survey contains critical information about how people reported they would vote prior to the election, and we will poll them again to learn how they actually behaved. This could show, for example, that people who previously thought they were less likely to vote were persuaded in the final days of the election to do so. If these late-deciders were disproportionately inclined to vote for Trump, this could help explain the discrepancy between pollster predictions and the ultimate outcome. It has also been noted that our data showed that the non-Hispanic white voters who supported Clinton in her Democratic primary run against Obama in 2008 were not supporting her to the same extent this time around — largely due to racial attitudes that favored Trump support.

Our survey also collected specific information about why people did not vote that may be able to help explain the discrepancy. People who originally reported high intentions to vote who ultimately did not would also affect poll results. Our survey contained a historically large proportion of respondents who intended to vote for third-party candidates. These voters may have ultimately decided to place their votes for Trump or Clinton instead.

Preliminary results have suggested that Trump was able to mobilize white men and women to vote in significantly greater levels than anticipated. Those with less than a college education are persistently overrepresented among nonresponders in polls. Prior to the election, it appeared that Trump would garner his greatest support from this group — but preliminary results from after the election suggest that this may not have been the case.

We will be using the final survey wave of the RAND study to examine these issues, among others, in the weeks ahead.

Michael Pollard is a sociologist at the nonprofit, nonpartisan RAND Corporation.

More About This Commentary

Commentary gives RAND researchers a platform to convey insights based on their professional expertise and often on their peer-reviewed research and analysis.