Should you prefer sensitive (noisy) or insensitive (lagging) poll aggregation?

There are quite a few poll aggregators and predictive models based on poll aggregation. This is a huge improvement on the status quo ante where our basic access to polling data was at the individual poll level.

Polls have error. Polls have biases (hidden and otherwise). Polls are a snapshot.

When you see a headline number of a poll, remember there are at least three factors: The poll’s data acquisition methodology (their sampling strategy, questions they ask, etc.), the actual data gathered, and the interpretation of that data. Each of these can have a very large effect on the headline numbers and any of them could easily reverse the rank order of the candidates. (See the wonderful Upshot article wherein they gave the same gathered data to 4 pollsters and got 4 different results which include a Trump and several Clinton leads. These pollsters were all doing a defensible job! No hackery there!)

Poll aggregation is, in effect, a poll of polls. So the same things feed in: their methodology (do you include 4-way race polls?), actual data, and interpretation (do you weight your averages?). As a result they can give you different results. For example, Talking Point Memo’s PollTracker:

Is generally a bit more pessimistic about Clinton than the HuffPost Pollster

And the RealClearPolitics one is more pessimistic about Clinton:

(I’m going roughly by the number of times Trump’s trend line touches or crosses Clinton’s.)

When we get to forecasting models, we get even more variance. A forecasting model is a prediction of a candidates chances of winning, usually expressed as a probability. So if you see that Clinton has a 65% chance of winning, it’s not that she’s polling at 65%, but that she has a 65% chance of winning the election (which she might do by a razor thin margin!). For win probability, a very stable razor thin margin is better than a highly volatile large margin. Or it should be!

Some predictive models are more volatile than others. You can see this most easily on FiveThirtyEight’s prediction page because they have convenient radio buttons for selecting between three models with different levels of sensitivity to the polls (with the “nowcast” being is a “straightforward” poll aggregation). In contrast, Sam Wang’s model tends to move more slowly, by design.

So, which should you prefer?

In general, just as with polls, it’s good to look at multiple models. It gives you more information and reminds you that prediction is a tough tough game.

I think, in general, it’s worth being stable rather than highly reactive, so I tend to lean on less volatile models. There are several reasons:

  1. We’re still pretty far out. Getting worked up about something that might be a statistical blip or a cyclic movement is pretty unwise. If some movement in the averages or forecasts is worth worrying about, then it will be durable and show up in all the models. Getting a “jump” on bad (or good) news isn’t really helpful, esp. as there’s little to do in response (for most of us). It’s similar to the stock market: Most of us aren’t equipped to do much short term trading efficiently, so it’s better off thinking long.
  2. We really don’t know the underlying causal structure. One phenomenon that has been shown in the lab is “differential (non-)response”, that is, it is common that people respond (at all!) to polls depending on “(de)energising” events. Thus, consider convention bumps. Each candidate typically gets a boost in the polls that then fades after their convention. Why? Are people changing their mind? Are they really that fickle? Perhaps, but it also could be the case that there voting intentions (which is what we care about) don’t change, but whether and how they respond to polls changes. Thus, in addition to sampling error and other methodological and interpretive biases, we have the possibility that salient events might change polling results without there being a change in the phenomenon we’re trying to measure.
  3. Given the strong negatives associated with a Trump victory, anything from a 10% on up is extremely worrisome. It’s worth being worried. If you can use that worry to prompt action, you should do it regardless of the current state of the polls.

So, prefer the more stable aggregators and forecasts. Also prefer the ones that are most inclusive of polls and minimise the “special sauce” in their models. If you want to know what a fundamentals model predicts, just use a separate prediction rather than trying to weave it into your polls based predictor. There’s enough interpretative variably that adding things which aren’t really made to work together is a bad idea. Better that each sort of evidential base has it’s own predictive model and you can compare them more or less directly.

Note: scrubbed all the embedding code for the aggregations. I’ll try to update with screenshots later. Sigh.

Update: PollyVote is a forecast model aggregator! So it saves you the work 🙂 (It seems to have two levels of aggregation: It aggregates with a type of forecasting method, e.g., prediction market vs. econometric, and it aggregates over those types.) One interesting thing is that it provides a popular and EV vote total, as opposed to a win probability. Another is that it doesn’t incorporate error estimates (indeed, it’s hard to see how to do it). OTOH, it’s super simple and straightforward and covers the main sources of evidence. It will be interesting to see how it does in this weird weird year.