We are still absorbing the results, but the general picture is clear: the model was a huge success...
On 30 May, the YouGov model for the 2017 General Election was posted here and created a firestorm. At that time, most polls showed double-digit leads for the Tories and nobody else was suggesting that the Conservative majority was at risk. A front page story in the Times reported that our model pointed to a 'hung parliament'. According to the Financial Times, 'The pound fell sharply on the heels of a YouGov model that the Conservative party could lose 20 seats in the upcoming election.' The model was called 'controversial' and worse. The Express headlined 'Holes in the poll EXPOSED' and quoted our former colleague Peter Kellner who called the decision to publish it 'brave' before forecasting a 66 seat Conservative majority.
In addition to the public skepticism, insiders in both the Labour and Conservative parties repeatedly told us that their assessment of the situation was wildly out of line with our estimates. This, of course, made us nervous and we wondered whether they knew something that we didn't. Jim Messina, who advised the Conservatives, called it 'idiotic' and Iain Dale ridiculed it on Newsnight. Sometimes political insiders really do have access to better data and analysis than the public polls. But not this time.
Make no mistake. Nobody in the Cons saw this coming. Strategists publicly ridiculed the YouGov model. Source: "CCHQ were sure of a big win".— Matthew Goodwin (@GoodwinMJ) June 9, 2017
We are still absorbing the results and trying to understand where the model performed well and where it performed less well. But the general picture is clear: the model was a huge success in an election which most politicians, pollsters and commentators got badly wrong. Here is some more detail on how and why the model worked.
During the campaign, YouGov ran two mostly independent polling projects. The London-based UK political team conducted two traditional polls of roughly 2,000 interviews every week for The Times and The Sunday Times. The model was created by a separate team. Ben Lauderdale and Jack Blumenau of LSE built the model, with background modeling and data support by Delia Bailey and Persephone Tsebelis of the YouGov Data Science team in California. We made an explicit decision that we would not attempt to make the results between the two approaches consistent and, while not radically different, we ended the campaign with a seven point Conservative margin in our final poll for The Times and a four point margin in the final model estimate. YouGov CEO Stephan Shakespeare explained the difference here.
Our final estimates posted before the election had the Conservatives winning a narrow victory over Labour on vote share (42-38 published, a difference of 3.4 percent before rounding) and falling well short of a majority with 302 seats to Labour's 269. In the event, the Conservative vote share lead was slightly smaller than we expected (42.4 to 40.0, a difference of 2.4 percent). The Conservatives were a bit more efficient at turning votes to seats than we expected, securing 318 seats to Labour's 262.
It is possible for a pollster to get lucky on a national vote share or seat prediction, even if the underlying sample data and analysis are not sound. Similarly, even sound methodologies will be unlucky sometimes. But we are confident that the model's success was not just luck: it is practically impossible to get lucky on 632 constituency predictions and the model correctly identified the winner in 93 percent of the seats.
The model did not just get the winner correct, in most cases the estimated party vote shares were accurate to within three or four percent. The four scatterplots below compare the model estimate for Conservative, Labour, Liberal Democrats, and SNP vote share with the actual outcomes. We slightly underestimate vote for the two largest parties and overestimate vote for the two smaller parties, but, overall, the level of accuracy is impressive.
Most of the 43 seat errors were in seats which both the poll and the outcome was very close, with a few larger misses in Scotland. We were able to achieve this success rate because we were very successful at predicting which kinds of seats would have swings from Conservative to Labour and which would have swings from in the opposite direction. This election saw large and variable swings in different constituencies. As the scatterplot below shows, there were seats with large swings (10 percent or more) to the Conservatives and seats with similarly large swings to Labour, and the model did a good job of predicting which were which.
One of the most striking predictions of our model was that Labour would gain the Canterbury constituency. Conservatives have won every election in this constituency since its creation in 1918. In 2015, the Conservative Julian Brazier won 43 percent of the vote, versus 25 percent for Labour, 14 percent for UKIP and 12 percent for the Lib Dems. This was far from an obvious opportunity for a Labour gain, but the model put Labour ahead of Conservatives by 45 to 43 percent. In fact, Labour's Rosie Duffield gained the seat on 45.0 of the vote to 44.7 for Julian Brazier. This prediction came from a combination of Canturbury being a relatively urban and Remain-leaning constituency within its region, and the presence of a large number of students, both of which were associated with Labour gains. We did not have enough survey responses in any single constituency to find constituency-specific swings, but Canterbury was part of a larger pattern across constituencies which these shared characteristics, and which helped our model capture this striking result.
The largest swings to the Conservatives occurred in Scotland. The most extreme of these was in Ochil and South Perthshire (the far lower left point in the plot above). In 2015, the SNP gained the seat from Labour on 46 percent of the vote, versus 28 percent for Labour and 20 percent for the Conservatives. We predicted a massive swing to Conservatives: 43 percent for the Tories, versus 35 percent for SNP, and 20 percent for Labour, which was almost exactly the ultimate 42 - 35 - 20 result. We have not looked into this forecast in sufficient detail to be sure exactly what patterns the model found that yielded this result. With 632 constituencies in Great Britain, we simply did not have the time to look at every constituency, or even the most anomalous swings. This is one of the most challenging aspects of this kind of analysis. The only way to generate constituency level estimates is to rely on the soundness of the data collection and the data analysis method. It is no more possible to study the details of 632 constituencies than it is to get out and visit them all in person and talk to people on the ground.
We certainly did not get all the seats this close to correct.
The timeplot below compares the model estimates over the course of the campaign with other public polls. There are two main differences apparent in the graph. First, all the polls showed a steep drop in support for Conservatives in the early party of the campaign, but the model estimate of Conservative vote was substantially below that found in conventional polling. However, after about 27 May, the model estimate stabilized around a three percent lead. We find little movement in voting intentions after either of the two terrorist attacks. Other polls, however, show substantial swings in the final two weeks. We suspect this is mostly a problem of sample composition or changes in methodology, rather than real change in the electorate. Individual polls, such as the traditional ones YouGov conducted, can fluctuate up or down by three points due to just random variability. The daily model estimate, in contrast is based on many more interviews and statistical adjustment for variations in sample composition, so it is much less noisy.
This model, like all models, is not perfect. We overestimated Labour relative to the Conservatives slightly in terms of seats, while underestimating slightly with respect to vote share. We underestimated the extent to which voters would vote for the Conservatives and Labour relative to the other parties, so both major parties slightly overperformed our predicted vote shares. We overestimated the SNP, which meant that a disproportionate number of our seat errors were in Scotland (13 of 43). Still, our 95 percent confidence interval for SNP seats in Scotland of 30-53 did include the actual result of 35 SNP seats. We will never be able to get everything exactly right, so it is important to provide measures of uncertainty in estimates. We consider estimates within the reported confidence intervals to be successful, even if the estimate gives a different winner.
Good performance in one election is no guarantee of future performance. However, similar models performed well in two other elections last year (the E.U. Referendum and the U.S. Presidential elections), which caused problems for traditional polling methodologies. We believe that the MRP approach solves many of the problems -- low response rates, changes in sample composition, and ad hoc adjustments -- which have plagued polling. Despite the evident failure of most polls in this election, there is reason to be optimistic about the levels of accuracy can be achieved with larger sample sizes and better statistical methods.