Saturday, July 1, 2023

More Numbers Every Day

New Book: More Numbers Every Day


Chapter 2: Numbers & the Body

  • Numbers influence the direction we may walk, or we look down when we actually count down!
  • We can feel younger/older, stronger/weaker: e.g., Delimiter #s like 200 signify something like we can't break through, like lifting more than that. But we can get fooled and do more than we think if say, weights get mixed up!
  • There are number neurons in the brain that get wired/trained early on in life
  • the lower someone's psychological age, the faster they walk; walking speed is a good overall indicator of health and well-being

Wednesday, April 6, 2022

Misbehaving Proportions

In Chapter 5, More Pie than Plate, Ellenberg continues his journey into how proportions can mislead. It boggles my mind how such a simple thing such as percentages/proportions can be gamed! In fact, it's quite possible that the people coming up with these aren't even aware themselves. Maybe they just do a back of the envelope calculation, see that it advances their agenda and slap it on an advertisement or endorsement somewhere. But that view is probably too forgiving.

The key lesson in this chapter is: "Don't talk about percentages of numbers when the numbers might be negative...When you disobey the slogan I gave you, all sorts of weird incongruities start to bubble up."

It allows you to tell fake stories!

He gives plenty of real-life examples. One instance is when Governor Scott Walker of Wisconsin touted "50 percent of U.S. job growth in June came from our state". This was based on the US economy as a whole adding only 18,000 jobs nationally, and Wisconsin having a net increase of 9,500 jobs. Thus 9500/18000 > 50%; Seems fair at first but upon closer examination, it's weird how one state could claim over 50% of the jobs created. Turns out this is what happens when you include negative numbers in the analysis. By the same logic, Minnesota could have reported 70% job creation! Texas, California, Michigan and Massachusetts all beat Wisconsin! How could that be? What's happening is that "job losses in other states almost exactly balanced out the jobs created in places like Wisconsin, Massachusetts and Texas". Thus taken together, that's how you get to 18,000 net jobs nationally. And slapping on 9500 jobs as the numerator allows Walker to take credit for something that in reality is just a mathematical oddity.

Other juicy examples are:

  1. Income Inequality: Common ways to pick apart the data is to separate out the 1% and the 99% and show how the 99% keep losing ground. However, if you dive into the top 10% (excluding the top 1%), you can see that this next cohort (so the top 89% -> 99%) also has been increasing their gains in income. And for this to happen, the bottom 89% is actually making negative gains! So to accommodate these kinds of growth for the top 10% (or 11%), the other 90% (or 89%) are actually making less! This should probably be more of a headline than just saying that the top 1% or 10% are outpacing everyone else in their shares of income.
  2. Ad campaign in the 2012 US election claiming that "Women account for 92.3% of all jobs lost under Obama". This kind of outlandish claim is on the face of it already suspect so there must be some kind of gaming of numbers here. And indeed there is. The Romney campaign who was responsible for this ad, took net job numbers for a set period of time and divided one by the other to arrive at the 92.3% figure. While the mechanics of the calculation is technically correct, this is not really the right thing to measure. It's a spin on the numbers to claim that Obama is responsible for the drop. If the purpose of the ad is to make Obama look bad, mission accomplished! "The net job loss is positive sometimes, and negative other times, which is why taking percentages of it is a dangerous business." So if you want the truth, you need to ask a different question (and by association, make a different calculation). But that's only if you want the truth.

One indication that this is a bad methodology to get a result is that if they had shifted to a different period to start their calculation (ie, start in Feb 2009 rather than Jan 2009), they would have shown that "women accounted for over 3,000% of all jobs lost on Obama's watch!" But that is OBVIOUSLY silly, so they couldn't make that claim and have it taken seriously. So we should be on red alert that there's something fishy about their claim. This is reminiscent of Gary Smith's chapters on Data Mining; he raised the example of the Foolish Four stocks and their stock picking strategy. Of course you had to buy the stocks in January. Researchers tried their strategy but chose July as the month to purchase the stocks and guess what? It was a miserable failure. Looks like the Romney campaign employed their own data-mining strategy.


What I find fascinating in all these examples is that one of the most fundamental concepts in math, one that elementary schoolers learn to compute, (we're referring to the proportion / percentage here) can be misused in such a dramatic way! Just because you know how to compute something doesn't mean it's right. If you take a grade-school example of say, dividing 3 apples by 4 oranges when the question is: what is the ratio of oranges to apples. There, the mistake is obvious. These real-life examples are more insidious. It's not readily apparent what the problem is. You take net numbers, divide and show a drastic increase in crimes, or x% of women lost their jobs.

Either the people doing these calculations aren't aware of their 'innocent' calculations or they are intentionally misleading us. How do we poke holes into this? Training ourselves to recognize the ways in which we can be misled is important...

 


Sunday, April 3, 2022

Law of Averages & Law of Small Numbers

This is to go over the "Laws" of Averages and Small Numbers as sometimes these get conflated.

Both are not in reality laws; they are actually incorrect biases in thinking that occur so often that the word "Law" was applied as a joke.

Law of Averages

This law is about our tendency to think that things must balance out. So if you flip a fair coin 10x and you've seen 6 heads so far, there is a tendency to think that tails are more likely in the remaining 4 flips to 'balance things out'. In essence, tails are 'coming due'; there needs to be a correction so we can end up 50/50. This 'law' is false, hence not a law. Gary Smith states, "A coin has no control over how it lands...heads and tails are equally likely to appear no matter what happened on the last flip or the last 999 flips". Yet, this belief is so widespread, that another version of it is called "The Gambler's Fallacy" because of the belief of gamblers that luck will swing their way after a streak of bad luck.

Another example Smith writes: "A reader wrote to columnist Marilyn vos Savant saying that he had a lot of job interviews, but no offers. He hoped that the odds of an offer were increasing with every rejection."

Ellenberg writes about the Law of Averages as well in the chapter How much is that in Dead Americans? The Law of Averages not being true seems to be in "conflict with the Law of Large Numbers, which ought to be pushing" a 50-50 split (in the case of a fair coin being flipped). The two laws seem that they should go hand-in-hand. How can one be true and not the other??
There actually is no conflict because it's an illusion. If we flip a fair coin 10x, let's say we've flipped 10 heads in a row.
There are 2 thoughts that may arise:
(1) Something is wrong with the coin, i.e., it is weighted
(2) If the coin is fair, we must start to get tails to correct the imbalance we've observed thus far.

Let's assume the coin is fair, so we can disregard #1. Common sense says #2 is true; however what also is common sense is that "the coin can't remember what happened the first 10 times"! So how can the coin correct itself? Perhaps it's not the coin itself but some divine intervention, like God, or the Universe. Indeed, de Moivre who investigated this phenomenon did raise this.

The reality however is that coins indeed have no memory and all the future flips still have a 50/50 chance of coming up heads. Ellenberg continues, "The way the overall proportion settles down to 50% isn't that fate favors tails to compensate for the heads that have already landed; it's that those first ten flips become less and less important the more flips we make.... That's how the Law of Large Numbers works: not by balancing out what's already happened, but by diluting what's already happened with new data, until the past is so proportionally negligible that it can safely be forgotten."
So there is ONLY the Law of Large Numbers. Faulty reasoning devised the Law of Averages to justify making bad bets...

On a somewhat related topic, Sam L. Savage has a whole book called "The Flaw of Averages" that speaks to the danger of focusing only on central tendencies and ignoring variations. This kind of thinking is easy and has led to innumerable real-world mishaps and disasters. Posts to come on this later.

Law of Small Numbers

A similar "Law" to the "Law of Averages" is the "Law of Small Numbers". This is not a real law; it's a bias or mis-application of probability that is so prevalent that Kahneman and Tversky dubbed it a law. It's a play on the Law of Large Numbers and expresses the fallacy that many people believe that a small sample ought to resemble the population from which it is drawn. So even after a few draws from a fishbowl, it might be tempting to jump to conclusions about all the contents in the fishbowl.

John D. Cook has some good posts on this:

https://www.johndcook.com/blog/2008/01/25/example-of-the-law-of-small-numbers/

The law of large numbers is a mathematical theorem; the law of small numbers is an observation about human psychology...people underestimate the variability in small samples.

This under-estimation of variability in small samples is discussed in my previous blog post on brain cancer deaths (Ellenberg's book). Another example of this phenomenon is the belief in the "hot-hand". If a player makes 3 or 4 shots in a row, there is a belief that the player is 'hot' and observers attach a higher probability to the next few shots being made.

Saturday, April 2, 2022

The Danger of Comparing Proportions

In Ellenberg's book How Not to Be Wrong, Chapter 4, How much is that in Dead Americans?, the author writes about how a common statistic used to express tragedies is to calculate the proportion of people killed in some accident or terrorist bombing to the general population of the country and then to equate that to the same percentage if it happened in the US. Purportedly this is done so Americans grasp the equivalent tragedy if it happened here.

It makes sense on the face of it and I've personally never questioned this. And if the proportion is high, then presumably it makes one feel the tragedy at a personal level if one were to imagine it happening in their own country.

So in this chapter, Ellenberg takes us on a mathematical tour of the methods of determining the 'equivalences' of such calamities and I found it quite illuminating.

I will go through his reasoning below to likewise give a similar journey in this blog post:

1. While using proportions per country seems to make sense at the outset, he says that if you can compute these numbers in multiple ways and get different answers, this is an indicator that it may not be the best method to use.

Example: If a bombing in Tegucigalpa, Honduras killed 200 people, you may see news articles that state that the equivalent number of deaths had we a similar bombing in NYC would be: close to 1,400 deaths which of course if you are in NY, you would be shocked. And that of course is the desired effect.

To get to 1,400 deaths, you do this:

PopulationPct
Tegucigalpa11580000.01727%
Honduras99050000.00202%

So using 0.01727% and multiplying by NYC's population gives you:
Equiv Victims
Population# using Tegucigalpa-basis# using Honduras-basis
NY80000001381.692573161.5345785
USA33000000056994.818656663.301363

1381.7 which is ~1400 victims. Shocking!
But as you can see, there are measures you can compute by using the entire country instead (Honduras vs. USA). There you get 6663 victims. Shocking as well. One can dial up/down the shock as one sees fit.

So Ellenberg states This multiplicity of conclusions should be a red flag. Something is fishy with the method of proportions.

Of course, we can't just throw out proportions! They can be useful, depending on the circumstances. So Ellenberg continues his tour by diving deeper.

2. He next goes into deaths by brain cancer as calculated by state. We do this on a proportional basis as this is preferable to absolute numbers. Absolute numbers by state aren't useful because the populations differ widely per state; thus we are better off "computing the proportion of each state's population that dies of brain cancer each year". Calculating this by state showed that South Dakota came in first with 5.7 brain cancer deaths per 100,000 people per year, well above the national rate of 3.4; However North Dakota was on the bottom of the list! Why is one Dakota at the top and the other Dakota at the bottom?! Similarly with Vermont and Maine; In Vermont, the rate is low, but Maine is in the top 5!

What these states have in common is that the populations are low! "Living in a small state, apparently, makes it either much more or much less likely you'll get brain cancer."

This sounds ridiculous and so there must be something else going on. To get to the bottom of this, the author goes into coin-flipping and demonstrates with random simulation that with small samples there is a lot more variability. This is at the heart of the Law of Large Numbers which states that as the sample size grows, the means of those samples approach the mean of the general population. So with small samples, you'll get more variability but as the sample sizes grow, you'll get more stability in the long run.

Thus, this explains what's happening with brain cancer. "Measuring the absolute number of brain cancer deaths is biased towards the big states; but measuring the highest rates - or the lowest ones!- put the smallest states in the lead. That's how South Dakota can have one of the highest rates of brain cancer death while North Dakota claims one of the lowest... it's because smaller populations are inherently more variable."

He then goes on give examples from (1) NBA where players you never heard of have the highest averages simply because they made a couple of shots in the limited amount of playing time they had and (2) schools in North Carolina doing well (and badly) on standardized tests; in the latter example he states "the reason small schools dominate the top 25 isn't because small schools are better, but because small schools have more variable test scores. A few child prodigies or a few third-grade slackers can swing a small school's average wildly". He continues "So how are we supposed to know which school is best, or which state is most cancer-prone, if taking simple averages doesn't work? If you're an executive managing a lot of teams, how can you accurately assess performance when the smaller teams are more likely to predominate at both the top and bottom tiers of your rankings?"

"There is, unfortunately, no easy answer... You could accomplish this by taking some kind of weighted average [of the state rate] with the national rate. But how to weigh the two numbers? That's a bit of an art, involving a fair amount of technical labor."

Incidentally, this variability in small sample sizes explanation rang a bell and so I looked up why it did. It was because I previously had read another excellent book: The Flaw of Averages by Sam L. Savage. In chapter 17, he characterizes this variability in small sample sizes in the chapter The Flaw of Extremes, in which he asks "Did you know that localities whose residents have the largest average earlobe size tend to be small towns?" This seems baffling until he explains it. "The sizes of towns and earlobes have nothing to do with each other; it's just that averages with small samples have more variability than averages over large samples". This applies not just to earlobes but to prevalence of diseases, crime rates, educational test scores and anything else you may care to average. "To summarize, the flaw of extremes results from focusing on abnormal outcomes such as 90th percentiles, worse than average cancer rates, or above average test scores. Combining or comparing such extreme outcomes can yield misleading results."

To bring this back to calculating which tragedies are worse, there is no one-size-fits-all rule. If you use the proportion rule, then for the 20th century you get: (1) Massacre of the Herero of Namibia by German colonists, (2) slaughter of Cambodians by Pol Pot, (3) King Leopold's war in the Congo. Hitler, Stalin, Mao and the big populations they killed don't make the list. So obviously the proportions ranking is not a good measure. "How much distress should we experience when we read about the deaths of people in Israel, Palestine, Nicaragua, or Spain?"

Ellenberg gives a rule of thumb: "If the magnitude of a disaster is so great that it feels to talk about 'survivors', then it makes sense to measure the death toll as a proportion of total population." He gives the example of the Rwandan genocide; there we can talk about survivors, so we use proportions: in this case, 75% of the Tutsi population was wiped out. Then we can equate other disasters that wiped out a similar ratio as the 'equivalent of the Rwandan genocide", (say if 75% of the Swiss population was wiped out in a catastrophe).

On the other hand, in some cases, you would stay away from using proportions (and stop equating tragedies in one country to a supposed tragedy in another) when there's no need to talk about survivors. You wouldn't call someone who lives in Seattle a 'survivor' of the World Trade Center attack for example. "So it's probably not useful to think of deaths at the WTC as a proportion of all Americans. Only about one in a hundred thousand Americans, or 0.001%, died at the WTC that day. That number is too close to zero for your intuition to grasp hold of it; you have no feeling for what that proportion means."

I also like what Ellenberg says at the beginning of the chapter. "When there are two men left in the bar at closing time, and one of them coldcocks the other, it is not equivalent in context to 150 million Americans getting simultaneously punched in the face"; yet this is the approach that's taken when 'equating' tragedies across different countries. (Or perhaps it's done primarily with Americans when some political party wants to push a talking point in favor of their agenda).

At the end of the chapter, it is posed "So how are we supposed to rank atrocities, if not by absolute numbers and not by proportion? Some comparisons are clear. The Rwanda genocide was worse than 9/11 and 9/11 was worse than Columbine and Columbine was worse than one person getting killed in a drunk-driving accident. Others, separated by vast differences in time and space, are harder to compare....the question of whether one war was worse than another is fundamentally unlike the question of whether one number is bigger than another...if you want to imagine what it means for 26 people to be killed by terrorist bombings, imagine 26 people killed by terrorist bombings - not halfway across the world, but in your own city."

 So to review:

(1) Absolute numbers aren't sufficient to compare.

(2) Proportions are better for comparison but even then, the proportions should be used in 'exceptional' cases (for e.g., when you can talk about survivors). The point here is that proportions can be gamed, based on what you use as the denominator; thus if you come across these kinds of comparisons, be weary and cognizant of the author's possible agenda in swaying you.

(3) Side Journey into using proportions: must be careful of coming to conclusions when there are small samples involved


Friday, March 25, 2022

Incautious Extrapolation

I saw this error referenced in two books:

  • Standard Deviations, by Gary Smith
  • How Not to Be Wrong, by Jordan Ellenberg
and so I thought to jot a quick summary.

Incautious Extrapolation, a term used by Smith refers to how human beings are prone to using data without theory and draw inaccurate conclusions.
He has a few examples:

1. Abraham Lincoln, the man himself(!), incautiously extrapolated the population of the US in 1930 to be 251 million people (he did this from Census Data collected from 1790 to 1860). The actual population in 1930 was 123 million (less than half his prediction!). Smith says, "If we have no logical explanation for a historical trend and nonetheless assume it will continue, we are making an incautious extrapolation that may well turn out to be embarrassingly incorrect".

2. Another one that was quite funny was the average sentence length uttered by British speakers over the last 350 years. The sentence length was found to have fallen from 72.2 words per sentence for Francis Bacon to 24.2 for Winston Churchill. At this rate, we will soon arrive at 0 words per sentence and then go negative some years later!

Another example is "We will all work for IBM" in which analysts studied IBM's explosive stock growth in the 1970s and plotted these points on a graph and fitted a curve showing exponential growth; they based their future estimates on this graph and turned out to be quite wrong. Such growth could simply not be sustained. This is what happens when you blindly follow data without having theory. The other point in this example is that incautious extrapolation doesn't just happen with straight lines. You can do this with curves of any kind.

Ellenberg uses the example of a journal article on Obesity which extrapolated that All Americans will be obese by 2048. Again, the researchers took data points and applied linear regression to come up with this future state prediction. Again, this is applying a technique to data without theory. Is the phenomena we are looking at linear? As with the IBM example (which assumed that growth would just keep happening without considering that already big companies cannot sustain such growth indefinitely; there are factors such as labor force and productivity that will constrain it), as the number of overweight people grows, it *must* slow down, because there are fewer and fewer people to make obese; these are the skinny people remaining and we will always have such a population. A little theory / common-sense goes a long way, but I suppose it is easier / sexier and provocative to make outlandish statements.

On a final note, both authors use Mark Twain's example of incautious extrapolation of the length of the Mississippi River and how based on measurements, if we look backwards, it used to be upwards of 1,300,000 miles long and in 742 years, it will be 1.75 miles long. "There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact."


Sunday, February 20, 2022

On Randomness and Patterns and Hot Hands

 From Standard Deviations, by Gary Smith [Chapter 8: When You're Hot, You're Not]:

This is one of my favorite chapters so far (although that's not saying too much, because each chapter has been a gem!) in that he delves further into the story-telling aspects of human beings explaining random patterns as if there was some method to the madness.

Here are some highlights:

1. We see some pattern and we discount that it can't be random. Instead we ascribe a story to it: like hot hand theory or stock-market fluctuations.

For e.g., take the sequence

x x x x x <= five x'es in a row!

That CAN'T be random! In fact, Smith does coin toss experiments and shows that such patterns can quite easily occur! Alas, in a sequence of 20 coin flips, the theoretical probability of getting at least 4 heads in a row is .768! so there's your hot streak!

Of course, there could be some process that's generating the five or more x'es purposefully. However, that's not the point. The point is, that we don't know what's producing it. Even a long string of such x'es can be produced at random and human beings don't account for that. They immediately want to tell a story about how those five x'es were created. We shouldn't be so quick to dismiss that the explanation is plain old randomness.

2. If we were asked to construct something that looked/seemed random, we would never create:

x x x x x
or
o o o o o
or
x o x o x


because these have patterns and there's NO way that randomness would produce that right, RIGHT?
 

So similarly to point 1, where we are predisposed not to recognize/treat a pattern (5 x'es in a row) as potentially being generated at random, that predisposition also tends to block/obstruct us in creating strings of random patterns because we would never even think of including those with patterns (5 x'es in a row) because such sequences are too ordered to be random (right? RIGHT??)

Smith goes on to talk in more detail about hot-hands. He says that hot-hands (and cold hands) may actually be a real phenomenon but may not have a large enough effect to measure precisely in really popular sports like basketball because of many confounding factors. (e.g., player takes shots from different angles/positions, confidence of player is higher/lower, person guarding them may be more/less aggressive, etc.); The overall message is that:

a. The hot-hands phenomenon is hard to measure (especially in popular sports with lots of confounders); However, he was able to measure it with horseshoes and bowling and found results that had significance. So the phenomenon is real, however it's not huge.

b. What the public normally sees in popular sports (basketball, football, baseball, etc.) where a player gets 'hot' can easily be the result of randomness (see the above e.g. with the five x'es in a row). Sure, it's a nice story to say there is a hot-hand at work but it's only a story and makes for good TV!


Sunday, November 21, 2021

Observational Studies [and Survivor Bias]

 Again  From "Standard Deviations, Flawed Assumptions, Tortured Data & Other ways to Lie with Statistics" by Gary Smith


Chapter 2, Garbage In, Gospel Out

Another to be careful about with regard to observational studies is that they can be subject to survivorship bias. One of the most famous examples of this is from WW II when the British Royal Air Force (RAF) noticed bullet hole patterns in returning airplanes and wanted to know which areas of the plane needed extra protection like more armor. This is a famous image depicting this
PLACEHOLDER

The thought was to place the armor over the places the most bullet holes were found thinking that would help planes survive German attacks. However, Abraham Wald had a critical insight in that he noticed that these returning planes had very few holes on the cockpit, engines or fuel tanks. He realized that they weren't seeing many planes returning that had bullet holes in those locations and concluded that most of the planes succumbing to attacks were being hit in those locations. Thus they placed the armor in those locations thus increasing the longevity of the airplanes. This was a crucial insight that helped win the war.

Such examples of survivorship bias can be seen in observational studies, most notably ones that are backwards-looking. If we choose a sample today and look backwards, we only see the survivors. E.g., medical histories of the elderly (we omit those who died and therefore did not become elderly).

Smith brings up other examples: (1) A class-action lawsuit in the 70s that examined wage data between white and black employees, (2) HMO surveys, (3) ad for Red Lion Hotels, (4) NYC Veterinary hospitals on cat injuries/deaths after falling from apartment buildings [this led to doctors concocting all kinds of theories about why cats surviving falls from higher floors].

Yet another example which is close to my heart is his criticism of Good to Great, a famous business book written by Jim Collins. (Also thrown in for good measure is In Search of Excellence, by Tom Peters). Both books were ones I adored in my twenties when I discovered Self-Help and Business literature. They illuminated the way towards bettering oneself and bettering the institutions one worked in. They made sense. However, Smith rips these authors a new one by showing how these books are based on observational studies that suffer from survivorship-bias.
The correct way to go about such a study is to do it forward-looking, ie., start with a list of companies, then use plausible criteria to select x # of companies that are predicted to do better than the rest. "These criteria must be applied in an objective way, without peeking at how the companies did over the next forty years. It is not fair or meaningful to predict which companies will do well after looking at which companies did well! Those are not predictions, just history. After the chosen x are identified, their subsequent performance can be compared to other companies over a forty-year period... would have been a fair comparison"

Collins picked his companies after the 40 year period (i.e., after they survived) and he cherry-picked criteria that he thought made sense. He was deriving theories from data, which is always hazardous to do.

Smith continues to shred this by talking about being dealt five specific cards and then asking the probability of being dealt those cards. If you had predicted those cards before being dealt them, that would have been amazing. But 'predicting them' after being dealt them, not so amazing. In fact the probability is 100% !

"Finding common characteristics after the companies have been selected is not at all unexpected and not at all interesting." Can we use these characteristics to make forward predictions? Now, that would be interesting! Not backfitting theories to data.

"This problem plagues the entire genre of books on formulas/secrets/recipes for a successful business, a lasting marriage, living to be one hundred, and so on and so forth, that are based on backward-looking studies of successful business, marriages, and lives. There is inherent survivor bias ... [should rather] identify businesses or people with these traits and see how they do over the next ten or twenty or fifty years. Otherwise, we are just staring at the past instead of predicting the future"

Smith's concluding remarks are : Don't be Fooled. 
We observe people working, playing and living and we naturally draw conclusions from what we see. Our conclusions may be distorted by the fact that these people chose to do what they are doing. The traits we observe may not be due to the activity, but may instead reflect the people who chose to do the activity.

This is very deep to me because it's so easy to be fooled. Human beings love to tell stories and stories can constantly fool us and of course, we are the easiest ones to deceive. So telling a story about if you just do x activity will lead to y outcome is so seductive. It gives us control and a future reward. Except it's all illusion.

"We naturally draw conclusions from what we see - workers' wages, damaged aircraft, successful companies. We should also think about what we do not see - the employees who left, the planes that did not return, the companies that failed. The unseen data may be just as important, or even more important, than the seen data. To avoid survivor bias, start in the past and look forward"