Professor Ben Lauderdale, co-creator of YouGov's MRP model, explains how it works
Over the last seven days YouGov has interviewed approximately 100,000 panellists about their voting intentions in the 2019 General Election. While this is a much larger sample than our usual polls, the samples in each of the 650 Parliamentary constituencies are too small (on average, only 150 voters per constituency this week) to produce reliable estimates if we analysed the data as constituency polls.
In recent UK and US elections, YouGov has been using a technique called Multilevel Regression and Post-stratification (or 'MRP' for short) to produce estimates for small geographies. These include local authorities for the EU referendum, states and congressional districts in US elections, and parliamentary constituencies for UK elections.
The idea behind MRP is that we use the poll data from the preceding seven days to estimate a model that relates interview date, constituency, voter demographics, past voting behaviour, and other respondent profile variables to their current voting intentions. This model is then used to estimate the probability that a voter with specified characteristics will vote Conservative, Labour, or some other party. Using data from the UK Office of National Statistics, the British Election Study, and past election results, YouGov has estimated the number of each type of voter in each constituency. Combining the model probabilities and estimated census counts allows YouGov to produce estimates of the number of voters in each constituency intending to vote for a party. In 2017, when we applied this strategy to the UK general election, we correctly predicted 93% of individual seats as well as the overall hung parliament result.
Despite the strong performance of the method in the 2017 election, it is not magic and there are important limitations to keep in mind. First, we are reporting estimates of current voting intentions, not a forecast of how people will vote on 12 December. Panellists tell us how they intend to vote on 12 December, but they may change their minds and we do not attempt to quantify the probability that they will do so. Second, the samples in each constituency are too small to be reliable by themselves and are subject to more than just sampling error. To compensate for small sample sizes, our model looks for patterns in the data across constituencies. Our sample is large enough that we can identify patterns that occur across relatively small numbers of constituencies, but the largest model errors are likely to occur in constituencies with very atypical patterns of voting. Some examples of these are seats where there is a high profile independent candidate (e.g. Beaconsfield) or where there appears to be a new pattern of local competition in this election (e.g. Kensington).
For each constituency, we provide a vote estimate and a 95% confidence interval for each party. These are the model's best guess of what a large poll would show if it were conducted in that constituency during that time period. Readers should focus on the confidence intervals as giving a more reliable estimate of current voting intentions. But even these will not be right everywhere: 95% confidence with 632 seats means that we still expect the interval to miss the results in about 30 constituencies.
The model was developed primarily by Professor Ben Lauderdale of University College London in conjunction with Jack Blumenau (University College London), YouGov’s UK political team, and YouGov's Data Science team headed by Doug Rivers of Stanford University. The data are streamed directly from YouGov's survey system to its analytic database, Crunch. From there, the models are fit using Hamiltonian Monte Carlo with the open source software Stan. Stan was developed at Columbia University by Andrew Gelman and his colleagues, with support from YouGov and other organisations.