Archive for November, 2006

What is survivor bias?

Since it’s the title of this blog, I should probably define what survivor bias is. Wikipedia has a succinct definition:

Survivorship bias (or “Survivor bias”) is a statistical artifact in applications outside of finance, where studies on the remaining population are fallaciously compared with the historic average despite the survivors having unusual properties.

A simple example helps – an old scam is to pick some large number of people from the phone book (e.g. 10,000) and send them football predictions. For 50% of the people you mail them telling them that some team will win the big game that week and for the other half you say they will lose. 50% of the “predictions” will have been correct so you discard the incorrect predicitons and repeat this operation for say 6 weeks at which point there will be (10,000 / 2^6) ~ 156 people who got accurate results. At this point, you hit these people up for some large sum of money and dissappear. The point is that by discarding evidence of all of the losers you create a track record that you can predict things when in fact, it is only occurring by random chance.

There is a lot of evidence of this occurring in the mutual fund industry (poorly performing funds are closed) and finance in general. It’s very common in human nature to attribute success to “skill” and failure to “bad luck”. In physics and statistics, it’s common to calculate the statistical significance of a result – how large of an effect would you expect to occur purely from random chance (the null hypothesis) versus from the hypothesis you are studying. In life it can be a little harder to see how big the roulette table is.

Read Full Post »

Welcome to Survivor Bias


Survivor Bias is a blog about the role that randomness and statistics plays in the news, finance and  everyday life. Randomness is often downplayed as a factor – people want simple causality because it implies that they can simply control every aspect of life. It’s very easy to be fooled when you look at events in isolation.

This blog will mostly concern itself with manifestations of statistical biases in the real world – the survivor bias is a particular example of this. A lot of the writing here will be strongly influenced by the writings of Nicholas Nassim Taleb, Benoît Mandelbrot and Karl Popper amongst others as well as statical linguistics and datamining.

Read Full Post »