Yes, You Are (Maybe) Overconfident

Wednesday, March 31, 2010
By dreeves

218 of you took our calibration quiz, not counting the 10% of submissions that had to be thrown out for not being complete or giving ranges with the min greater than the max or other sanity check failures. (Here’s the raw data.)

The bad news is that you’re terrible at making 90% confidence intervals. For example, not a single person had all 10 of their intervals contain the true answer, which, if everyone were perfectly calibrated, should’ve happened by chance to 35% of you. Getting less than 6 good intervals should, statistically, not have happened to anyone. How many actually had 5 or fewer good intervals? 76% of you.

Here’s a histogram of the number of good intervals you got, out of 10:

The overlaid phantom histogram is what it would look like if it were really the case that every interval people gave had a 90% chance of containing the true answer. In other words, you should’ve made your intervals much wider. When we ask for a 90% confidence interval there’s in fact only a 41% chance that your interval contains the true answer.

We ran this quiz on Mechanical Turk as well and you marginally outperformed the turkers. The histogram of turkers’ good intervals is indicated by the red dots in the above graph. They failed our sanity checks at almost twice the rate (19%) of Messy Matters readers and of the remaining responses, the mean number of good intervals was 3.5 out of 10.

The more we’ve thought about (and read the literature on — or rather, consulted endlessly with Dan Goldstein, who knows the literature on) these kinds of overconfidence results, however, the less clear it is that the moral of this quiz is simply “people are overconfident”. For one thing, overconfidence depends on the question. The fraction of good intervals in your responses ranged from 23% (the length of the Nile and the gestation period of an Asian elephant) to 75% (number of OPEC countries). Of course, even 75% is not the 90% that was asked for.

More interestingly, in an ongoing follow-up study on Mechanical Turk we’re finding that after you get people’s intervals, more than half of them realize in retrospect that too few of their intervals are good. This suggests that people can learn to perform much better at this task.

Obligatory Wisdom of Crowds Demonstration

It’s not a fair demonstration since people weren’t asked for their best guesses, but here’s a table of median lower bounds, upper bounds, and midpoints of everyone’s ranges. Interestingly, people’s upper bounds are overall most accurate.

MLK Nile OPEC Bible Moon 747 Mozart Elephant Tokyo Ocean
True 39 4132 12 39 2160 390000 1756 645 5959 35994
Min 35 900 6 8 1000 20000 1700 180 5000 13500
Mid 45 1750 13 15 3500 63250 1725 320 8000 30000
Max 55 3000 20 20 5000 100000 1790 400 10000 40000

Thanks to Sharad Goel, Dan Goldstein, Bethany Soule, Dan Kaminsky, and Michael J.J. Tiffany.

Image: Kelly Savage

Tags: , , , , , , ,