YouGov election model Q&A

YouGov
May 31, 2017, 2:14 PM GMT+0

Is this your prediction for the election?

No – this is not a prediction. It is our estimate of what the range of possible results would be if the election took place today and the voting intention was showing what it currently shows, based on our current data. It is not a prediction of where we will be on 8th June. We don’t know how things will change between now and polling day, but it would be surprising if things didn’t alter – one way or the other – before then.

Is this a poll?

No – it is a Multi-level Regression and Post-stratification (MRP) model. Polls ask a representative sample of people a question and then show the results to that question, broken down by demographics. However, a model does calculations based on data from a poll. A model looks at the relationship between people’s characteristics and their answers to a question in a poll and uses these relationships to estimate possible outcomes. This model also uses information brought in from outside.

What is the model and how does it work?

It works by modelling every constituency and key voter types in Britain based on analysis of key demographics as well as voting behaviour in the 2015 general election and the 2016 EU referendum. Every day, YouGov conducts approximately 7,000 interviews with registered voters from our panel, who are shown both the parties and candidates running in their particular seat. This data is used to assess how each type of voter is shaping the race in every type of constituency in Britain. From this, the model calculates daily voting intention and seat estimates.

Have you assessed every local variable?

No. As it is a national-focussed aggregate model it does not account for specific local factors that may shape the vote in some seats.

How do you get the constituency estimate?

The model is based on the fact that there are trends and patterns that can be used to calculate how people are likely to vote, regardless of where they live. The model pools data for similar types of voters across different sorts of constituencies, providing an aggregated national view of what the shape of the race is at that time. Naturally, given the estimates are being done in this way there will almost certainly be some outliers.

Who made the model?

The model was developed primarily by Ben Lauderdale, Associate Professor of the London School of Economics, in conjunction with YouGov's Data Science team, headed by Professor Doug Rivers of Stanford University. The data are streamed directly from YouGov's survey system to its Crunch analytic database. From there, the models are fitted using Hamiltonian Monte Carlo with the open source software, Stan. Stan was developed at Columbia University by Professor Andrew Gelman and his colleagues, with support from YouGov and other organisations.

Are these numbers 100% right?

Both the seat and voting intention estimates have ranges and there is a 95% chance that these ranges in the model reflect the likely outcome of the election if it were held today. The number presented is the midpoint figure. So when assessing the data it is best to remember that both the seat and voting estimates for the two main parties could very well be higher, or lower, than this midpoint. As the vote and seat shares for the smaller parties are lower, the ranges on both figures are also narrower.

When it comes to the individual seats, the constituency displays show a vote estimate for each party and a 95% confidence interval, which is the model's best assessment of what a large poll would show if it were conducted in that seat on the same day. These confidence intervals provide our most reliable estimate of current voting intentions.

How have you accounted for turnout?

Turnout is assessed on voters’ demographics and is also based on analysis from 2010 and 2015 British Election Study data.

Who are you asking?

Every day, we speak to around 7,000 registered voters from our panel, who are shown both the parties and candidates running in their particular seat. This data is used to assess how each type of voter is shaping the race in every type of constituency in Britain. From this, the model calculates daily voting intention and seat estimates.

How many people have you polled in my constituency?

It varies from seat-to-seat. Naturally, we can’t ask everyone in a constituency how they would vote and a model cannot produce as accurate a result as a full scale poll in each seat. However, the sample size in each constituency is not all that important given that the demographics of each constituency – based on census data – are what is actually used to map voter data into each constituency.

When are the surveys taking place?

On a daily basis – we are continually conducting more interviews and adding the data from them to the model.

Are you saying that x candidate will get y% in my constituency?

You can’t do a seat estimate without looking at the likely outcomes in individual seats. However, the overarching purpose of the model is to estimate aggregate seat numbers on a national level, meaning that the individual seat estimates are meant to feed into the wider picture. It is important to be aware of the ranges in each seat – just because the midrange point suggests that x party has y%, it is unlikely that they will end up exactly with that vote share. While this is worth remembering in every seat, it is particularly pertinent where the race looks close.

Why are you doing this?

Elections are becoming more difficult to assess and so, as well as running traditional polls, we are looking at new ways to meet the challenge. We know we run a risk publishing so much data in the heat of a campaign but as data scientists we are committed to innovating, to increase both accuracy and specificity.

Why should I trust you over other research companies?

In this election so far, all research companies are showing the same movement – currently a slight fall in the Conservative vote and a rise in the Labour share. The question is to what extent this is taking place and how it is either amplified or muted by different levels of turnout among certain voter groups. We are committed to constantly improving our methods and we make every effort to ensure that our work represents our best estimation of what the world thinks. Ultimately, we will not know what needs improving on this model until June the 9th but we will be updating it each day until polling day, with our best latest estimates of seats and vote share.

Explore more data & articles