In this election we have been using two different measuring tools, traditional polling and big data modelling using MRP. We have done this in a commitment to advancing the science of survey research. Our final poll of the campaign shows the Conservatives with a 7 point lead over Labour, which would give them a comfortable majority. Our model has a smaller lead of just 4 points. Why are these different?
Our traditional polling methodology has been generally (though not perfectly) reliable. We select a sample of 2,000 panellists, carefully chosen to represent the electorate, and weight the sample to correct random fluctuations in who responds on a particular day. This methodology has been adjusted based on what we learned in 2015 and the EU referendum, but it is, by and large, the same approach YouGov has used successfully since we first used it in the 2001 election.
The new data model we have begun developing represents a significant departure from what we've done in the past. It uses big data (approximately 50,000 interviews each week). These are not carefully selected samples. Instead, we apply a recently developed statistical procedure, multilevel regression and post-stratification (MRP), to adjust for differences between the respondents and a nationally representative sample. We supply the algorithm with data every night and, after seven or eight hours of computation, it produces numbers which sometimes agree with our traditional poll and sometimes differ by a few points.
We always have the confidence bounds, though they are often ignored. The model estimates Conservative vote between 39% and 44% and Labour vote between 36% and 41%. The 7% Conservative lead of our last traditional poll of the campaign for The Times is notably higher than the model estimate, but the two methods are still within each other's margins of error.
The model, unlike the traditional methodology, makes estimates for every constituency and when we add them up, it doesn't give the Conservatives a majority. But we never predicted 'a hung parliament.' The model says that it's too close call whether Conservatives will win a majority (326 seats), not that they will definitely be under 326 seats. A Conservative majority is well within the margin of error and, in any event, these are estimates of voting intentions, not predictions of election outcomes.
We did not do constituency-level polling but modelling. Our MRP approach uses big data resources from many sources in addition to the survey questions. We try to understand the demographic/psychographic make up of each constituency and map our survey data onto that. Our research can tell us the voting intention of a voter who is aged A, has an income of B, education level of C and has attitudes to various issues of D, E, F and G. Our national sample will have lots of these people. We can project likely constituency level outcomes based on the profile of each place.
This MRP approach is still being tested but as it happens it was accurate in its first big test, predicting a 'Leave' win in last year's referendum. We will find out tomorrow which method came closer, but, at this stage, we have a longer track record of how our traditional methodology works.
Advancing the science of survey research requires both a willingness to experiment with new and different approaches and a commitment to transparency. This includes sharing the data and details of the algorithm, as we have in the past. We are therefore happy to publish two methods with two different readings: what we learn from the outcome of tomorrow's election will help us continue to improve our methodology.