Statistics Learning: Probability Theory

I'm taking MIT's 6.041x : Introduction to Probability: The Science of Uncertainty in Winter 2016. It's been only the first week but it is an awesome class so far! The professor and teaching staff are remarkable: quite lucid and very engaging.

Even though I've studied probability & statistics before [I have an undergrad degree in App. Math & Statistics], I didn't have a full appreciation of the concepts. I could probably calculate things but not really know what I was doing and what the underlying meaning was. Now, however, with the advent of data science coming into vogue and my own love for the topic, I've been exploring my roots in probability & statistics. As a professional project manager, it is my job to deal with uncertainty and I need to constantly find ways to combat it: via risk management and using Monte Carlo simulations to simulate uncertainty.

Highlights of the class:

A refresher in sets, sequences, limits, series, geometric series [sum is 1/(1-r)], Cantor's diagonalization argument on why the real #s are not countable [very cool!]
Using Sets as the basis, then layering probability onto sets; i.e., we calculate probabilities on events [which are subsets of the sample space]; one can then create a probability mapping/function that assigns probabilities to events; If Event A is said to occur, this means the outcome was one of the elements in that subset A. If the outcome fell outside of set A, then we say that Event A did not occur.
From a minimal # of axioms, proving various properties of probabilities [Union Bound property; or P(A u B) = P(A) + P(B) - P(A ^ B); these and others fall out of a small # of axioms
How the concept of area relates to probability; and how measure theory is an underlying mathematical foundation for allowing this; some weird subsets with areas cannot be used as a probability mapping, however the unit square is just fine. in fact areas in the unit square can be used to calculate probabilities
paradoxes that arise when applying the properties if you ignore instances of where the sets are not finite or not countable [i.e., the union of each point in the unit square is the entire unit square which has area = 1 (thus probability = 1); AND each point is disjoint from every other point, thus the P(union of the points) should be the sum of the probabilities of each point; however each point has 0 area = 0 probability, thus we just showed that 1 = 0; paradox! So we must be careful to apply the laws to appropriate sets. However, this paradox is resolved because you can only apply the (stronger version of the) additive axiom in cases where the individual disjoint sets can be ordered into a sequence. There is no way to do this with the points in the unit square. But it helps with calculating probabilities of an infinite # of sets [however each of those sets must be arrangeable in a countable fashion]
Infinite sets can be discrete [like the integers] and continuous [like the reals]; this distinction is important because these denote whether a set is countable or not. some of the laws/theorems cannot be applied in uncountable instances.
One way of interpreting probabilities is that a probability is a frequency of how often something occurs when done an infinite # of times. so P(of obtaining a head) = 0.5 so that can be viewed as half of (whatever large # of times I toss the coin).
Statistics is a field that complements probability by using data to come up with good models

Statistics Learning

Sunday, February 14, 2016

Probability Theory

No comments:

Post a Comment