Sunday, February 14, 2016

Law of Large Numbers

Sources:
Elementary Statistics, Mario F. Triola
The Drunkard's Walk, Leonard Mlodinow

"When finding probabilities with the relative frequency approach (Rule 1), we obtain an approximation instead of an exact value. As the total number of observations increases, the corresponding approximations tend to get closer to the actual probability. [this is referred to as the law of large numbers which is stated as: As a procedure is repeated again and again, the relative frequency probability (from Rule 1) of an event tends to approach the actual probability."

Jakob Bernoulli discovered this law/theorem in the 1680s or so and it says that (1) you give him a tolerance of error (+/- a percentage, e.g. + or - 5 percent) from the target value that you expect & (2) a tolerance of uncertainty (99% certain or 90% certain that you can be sure of the result). Given both, Bernoulli will tell you how many trials you need to conduct. His formulas did not last because they were based on approximations and so modern mathematics have improved on them; however the concept behind his law is the important piece: namely that it is always possible to conduct the procedure enough times to be almost certain that the percentage of what you are expecting will fall near the target.

Given the Law of Large Numbers, there is a funny/sarcastic version of it called the Law of Small Numbers, "which is based on a misconception [or mistaken intuition] that a small sample accurately reflects underlying probabilities. It is a sarcastic name describing the misguided attempt to apply the law of large numbers when the numbers aren't large." An example of this is seeing a CEO's performance over a range of years and then judging that performance as representative. One CEO's performance over a subset of years is hardly the basis with which to determine true performance.

Bernoulli had wanted to answer something of the sort: "Given that you view a certain number of roulette spins, how closely can you nail down the underlying probabilities, and with what level of confidence?" Instead he answered a closely related question: How well are underlying probabilities reflected in actual results? [he came up with a formula to determine now many trials would need to be conducted depending on how certain you wanted to be and how close you wanted to be to the true answer].

So with the second question, we are really talking about fixed probabilities that we suspect are know to be the case [e.g., in many cases, these are gambling examples where a priori probabilities are known]. However in most real-life cases, we do not know the probabilities beforehand and so we must actually answer the first question which is: given a set of data, how can we infer the underlying probabilities? [a much harder question and one that Rev. Thomas Bayes helps us with and the science of Bayesian statistics and inference].

No comments:

Post a Comment