Primary polling recap: (Sample) size matters

One of the most important facts about polling that was reaffirmed this primary season is that sample size is the most important variable related to polling accuracy.

We’ve noticed this before in elections, but this primary season there are several consistent examples of how accurate larger sample size polls are, and how inaccurate small sample size polls can be. The four races where we polled and had greater than 500 respondents all had very accurate results that were only 0.4% to 3.6% off of the final election day results. All of our smaller sample polls had greater differences from the election results.

During this 2014 primary season we polled a total of 37 primary races, and we correctly picked the winner in 30 of those races. Out of the seven that we picked wrong, five had sample sizes below 300 respondents.

As an example of one of the races we got wrong, let’s take a look at the HD 40 GOP Primary. This race pitted Colleen Burton against John Shannon. We polled this race four times from mid-July to mid-August, and we never had a sample size over 240. Keep in mind that only 8,493 people even voted in this race, so that is a very small population to try to survey.

What you will notice from the chart below is that Shannon’s vote percentage plateaued while Burton’s percentage showed a trend of improving over time. In the end, Shannon’s vote share stayed almost exactly where it had been in our final poll, while Burton took the entire share of the undecideds. This isn’t usually a common event, but we have seen it happen several times in the last few years.

2014 Primary Election HD-40 chart
The next important trend I would like to touch on is how a candidate’s momentum can carry them to victory on Election Day. To illustrate this we will take a look at one of the most hotly contested and expensive State House Republican primaries that was decided this week: HD 74, Richard DeNapoli vs. Julio Gonzalez.

Around the same time we polled this race, an internal campaign poll came out that showed DeNapoli way ahead, and it called Gonzalez “virtually unelectable.” Our first poll in mid-July showed DeNapoli ahead by over 23 points.

We polled this race a total of five times from mid-July to mid-August, with sample sizes always above 300. What you can see from the chart below is the consistent rise in Gonzalez’s numbers, while at the same time DeNapoli steadily declined.

On election day, similar to HD-40, the winner took almost the entire share of undecideds in our poll.

2014 Primary Election HD-74 chart

The last thing that needs to be mentioned is polls with high undecided shares and how those results should not be considered very accurate.

Our poll of the Republican primary race for State House District 30 in mid-July had an undecided share of 56.9%. With undecideds that high and a gap between the two candidates of less than 5%, it is impossible to make any kind of projection of a winner. If we had wanted a more accurate set of results, we should have polled this race again in mid-August.

As we get closer to the general election, we will be able to include voters from all parties, so small sample sizes should be less of an issue. But the big variable that we have to deal with in general elections is the turnout model, and what population demographics we will use to construct our calling lists to most accurately predict the proper forecast of who votes in November.