Again From "Standard Deviations, Flawed Assumptions, Tortured Data & Other ways to Lie with Statistics" by Gary Smith
Chapter 2, Garbage In, Gospel Out
Another to be careful about with regard to observational studies is that they can be subject to survivorship bias. One of the most famous examples of this is from WW II when the British Royal Air Force (RAF) noticed bullet hole patterns in returning airplanes and wanted to know which areas of the plane needed extra protection like more armor. This is a famous image depicting this
PLACEHOLDER
The thought was to place the armor over the places the most bullet holes were found thinking that would help planes survive German attacks. However, Abraham Wald had a critical insight in that he noticed that these returning planes had very few holes on the cockpit, engines or fuel tanks. He realized that they weren't seeing many planes returning that had bullet holes in those locations and concluded that most of the planes succumbing to attacks were being hit in those locations. Thus they placed the armor in those locations thus increasing the longevity of the airplanes. This was a crucial insight that helped win the war.
Such examples of survivorship bias can be seen in observational studies, most notably ones that are backwards-looking. If we choose a sample today and look backwards, we only see the survivors. E.g., medical histories of the elderly (we omit those who died and therefore did not become elderly).
Smith brings up other examples: (1) A class-action lawsuit in the 70s that examined wage data between white and black employees, (2) HMO surveys, (3) ad for Red Lion Hotels, (4) NYC Veterinary hospitals on cat injuries/deaths after falling from apartment buildings [this led to doctors concocting all kinds of theories about why cats surviving falls from higher floors].
Yet another example which is close to my heart is his criticism of Good to Great, a famous business book written by Jim Collins. (Also thrown in for good measure is In Search of Excellence, by Tom Peters). Both books were ones I adored in my twenties when I discovered Self-Help and Business literature. They illuminated the way towards bettering oneself and bettering the institutions one worked in. They made sense. However, Smith rips these authors a new one by showing how these books are based on observational studies that suffer from survivorship-bias.
The correct way to go about such a study is to do it forward-looking, ie., start with a list of companies, then use plausible criteria to select x # of companies that are predicted to do better than the rest. "These criteria must be applied in an objective way, without peeking at how the companies did over the next forty years. It is not fair or meaningful to predict which companies will do well after looking at which companies did well! Those are not predictions, just history. After the chosen x are identified, their subsequent performance can be compared to other companies over a forty-year period... would have been a fair comparison"
Collins picked his companies after the 40 year period (i.e., after they survived) and he cherry-picked criteria that he thought made sense. He was deriving theories from data, which is always hazardous to do.
Smith continues to shred this by talking about being dealt five specific cards and then asking the probability of being dealt those cards. If you had predicted those cards before being dealt them, that would have been amazing. But 'predicting them' after being dealt them, not so amazing. In fact the probability is 100% !
"Finding common characteristics after the companies have been selected is not at all unexpected and not at all interesting." Can we use these characteristics to make forward predictions? Now, that would be interesting! Not backfitting theories to data.
"This problem plagues the entire genre of books on formulas/secrets/recipes for a successful business, a lasting marriage, living to be one hundred, and so on and so forth, that are based on backward-looking studies of successful business, marriages, and lives. There is inherent survivor bias ... [should rather] identify businesses or people with these traits and see how they do over the next ten or twenty or fifty years. Otherwise, we are just staring at the past instead of predicting the future"
Smith's concluding remarks are : Don't be Fooled.
We observe people working, playing and living and we naturally draw conclusions from what we see. Our conclusions may be distorted by the fact that these people chose to do what they are doing. The traits we observe may not be due to the activity, but may instead reflect the people who chose to do the activity.
This is very deep to me because it's so easy to be fooled. Human beings love to tell stories and stories can constantly fool us and of course, we are the easiest ones to deceive. So telling a story about if you just do x activity will lead to y outcome is so seductive. It gives us control and a future reward. Except it's all illusion.
"We naturally draw conclusions from what we see - workers' wages, damaged aircraft, successful companies. We should also think about what we do not see - the employees who left, the planes that did not return, the companies that failed. The unseen data may be just as important, or even more important, than the seen data. To avoid survivor bias, start in the past and look forward"
Hey, thanks for the blog article.Really looking forward to read more. Cool.
ReplyDeleteData Science Course Online
Data Science Certification Course