Yes, You Are (Maybe) Overconfident

Wednesday, March 31, 2010

By dreeves

218 of you took our calibration quiz, not counting the 10% of submissions that had to be thrown out for not being complete or giving ranges with the min greater than the max or other sanity check failures. (Here’s the raw data.)

The bad news is that you’re terrible at making 90% confidence intervals. For example, not a single person had all 10 of their intervals contain the true answer, which, if everyone were perfectly calibrated, should’ve happened by chance to 35% of you. Getting less than 6 good intervals should, statistically, not have happened to anyone. How many actually had 5 or fewer good intervals? 76% of you.

Here’s a histogram of the number of good intervals you got, out of 10:

The overlaid phantom histogram is what it would look like if it were really the case that every interval people gave had a 90% chance of containing the true answer. In other words, you should’ve made your intervals much wider. When we ask for a 90% confidence interval there’s in fact only a 41% chance that your interval contains the true answer.

We ran this quiz on Mechanical Turk as well and you marginally outperformed the turkers. The histogram of turkers’ good intervals is indicated by the red dots in the above graph. They failed our sanity checks at almost twice the rate (19%) of Messy Matters readers and of the remaining responses, the mean number of good intervals was 3.5 out of 10.

The more we’ve thought about (and read the literature on — or rather, consulted endlessly with Dan Goldstein, who knows the literature on) these kinds of overconfidence results, however, the less clear it is that the moral of this quiz is simply “people are overconfident”. For one thing, overconfidence depends on the question. The fraction of good intervals in your responses ranged from 23% (the length of the Nile and the gestation period of an Asian elephant) to 75% (number of OPEC countries). Of course, even 75% is not the 90% that was asked for.

More interestingly, in an ongoing follow-up study on Mechanical Turk we’re finding that after you get people’s intervals, more than half of them realize in retrospect that too few of their intervals are good. This suggests that people can learn to perform much better at this task.

Obligatory Wisdom of Crowds Demonstration

It’s not a fair demonstration since people weren’t asked for their best guesses, but here’s a table of median lower bounds, upper bounds, and midpoints of everyone’s ranges. Interestingly, people’s upper bounds are overall most accurate.

	MLK	Nile	OPEC	Bible	Moon	747	Mozart	Elephant	Tokyo	Ocean
True	39	4132	12	39	2160	390000	1756	645	5959	35994
Min	35	900	6	8	1000	20000	1700	180	5000	13500
Mid	45	1750	13	15	3500	63250	1725	320	8000	30000
Max	55	3000	20	20	5000	100000	1790	400	10000	40000

Thanks to Sharad Goel, Dan Goldstein, Bethany Soule, Dan Kaminsky, and Michael J.J. Tiffany.

Image: Kelly Savage

Tags: calibration, decision theory, overcoming bias, overconfidence, prediction, psychology, rationality, wisdom of crowds

Posted 2010 Mar 31. RSS feed for comments on this post. Please leave a response, or trackback from your own site.

Yes, You Are (Maybe) Overconfident

Obligatory Wisdom of Crowds Demonstration

About

Archives

Messy Minding

Tags