🎲Data Science Statistics Unit 2 – Probability Axioms and Bayes' Theorem
Probability axioms and Bayes' Theorem form the foundation of statistical reasoning in data science. These concepts provide a framework for quantifying uncertainty, updating beliefs based on evidence, and making informed decisions in various fields.
Understanding these principles is crucial for data scientists. They enable the development of powerful machine learning algorithms, statistical models, and predictive tools that can extract meaningful insights from complex datasets and drive data-driven decision-making processes.
Probability quantifies the likelihood of an event occurring ranges from 0 (impossible) to 1 (certain)
Sample space represents all possible outcomes of an experiment or random process
Events are subsets of the sample space can be combined using set operations (union, intersection, complement)
Probability axioms provide the foundation for calculating probabilities ensure consistency and validity
Conditional probability measures the probability of an event occurring given that another event has already occurred
Denoted as P(A∣B) read as "the probability of A given B"
Calculated using the formula P(A∣B)=P(B)P(A∩B)
Independence two events are independent if the occurrence of one does not affect the probability of the other
Bayes' Theorem allows updating probabilities based on new evidence or information
Relates conditional probabilities P(A∣B) and P(B∣A)
Formula: P(A∣B)=P(B)P(B∣A)P(A)
Probability Basics
Probability is a measure of the likelihood that an event will occur
Expressed as a number between 0 and 1
0 indicates an impossible event
1 indicates a certain event
Sample space (usually denoted as Ω) is the set of all possible outcomes of an experiment or random process
An event is a subset of the sample space
Simple event consists of a single outcome (rolling a 6 on a die)
Compound event consists of multiple outcomes (rolling an even number on a die)
Probability of an event A is denoted as P(A)
Calculated by dividing the number of favorable outcomes by the total number of possible outcomes (assuming equally likely outcomes)
Probability Axioms
Axiom 1 (Non-negativity): The probability of any event A is greater than or equal to 0 P(A)≥0
Axiom 2 (Normalization): The probability of the entire sample space is equal to 1 P(Ω)=1
Axiom 3 (Additivity): For any two mutually exclusive events A and B, the probability of their union is the sum of their individual probabilities P(A∪B)=P(A)+P(B)
Consequences of the axioms:
The probability of an impossible event (empty set) is 0 P(∅)=0
The probability of the complement of an event A is P(Ac)=1−P(A)
For any two events A and B, P(A∪B)=P(A)+P(B)−P(A∩B)
Axioms ensure consistency and validity of probability calculations provide a foundation for deriving other probability rules and theorems
Set Theory and Probability
Set theory provides a framework for describing and manipulating events in probability
Union of two events A and B (denoted as A∪B) is the event that occurs when either A or B or both occur
Intersection of two events A and B (denoted as A∩B) is the event that occurs when both A and B occur simultaneously
Complement of an event A (denoted as Ac) is the event that occurs when A does not occur
Exhaustive events collectively cover the entire sample space P(A∪B)=1
Venn diagrams visually represent relationships between events using overlapping circles
Overlapping regions represent intersections
Non-overlapping regions represent mutually exclusive events
Conditional Probability
Conditional probability measures the probability of an event A occurring given that another event B has already occurred
Denoted as P(A∣B) read as "the probability of A given B"
Calculated using the formula P(A∣B)=P(B)P(A∩B)
Numerator is the probability of both A and B occurring
Denominator is the probability of B occurring
Multiplication rule: P(A∩B)=P(A∣B)P(B)=P(B∣A)P(A)
Independence two events A and B are independent if P(A∣B)=P(A) or P(B∣A)=P(B)
Occurrence of one event does not affect the probability of the other
For independent events, P(A∩B)=P(A)P(B)
Conditional probability is used to update probabilities based on new information or evidence
Bayes' Theorem
Bayes' Theorem relates conditional probabilities P(A∣B) and P(B∣A)
Allows updating probabilities based on new evidence or information
Formula: P(A∣B)=P(B)P(B∣A)P(A)
P(A) is the prior probability of A before considering evidence B
P(B∣A) is the likelihood of observing evidence B given that A is true
P(B) is the marginal probability of observing evidence B
P(A∣B) is the posterior probability of A after considering evidence B
Bayes' Theorem is derived from the multiplication rule and the law of total probability
Useful for updating beliefs or probabilities in light of new data (medical diagnosis, spam email filtering)
Requires specifying prior probabilities and likelihoods can be subjective or based on historical data
Applications in Data Science
Probability theory is fundamental to many aspects of data science and machine learning
Bayesian inference uses Bayes' Theorem to update probabilities or beliefs based on data
Prior probabilities represent initial beliefs about parameters or hypotheses
Likelihoods quantify the probability of observing the data given the parameters or hypotheses
Posterior probabilities represent updated beliefs after considering the data
Naive Bayes classifiers apply Bayes' Theorem to classify instances based on feature probabilities
Assumes independence between features given the class label
Efficient and effective for text classification and spam filtering
Probabilistic graphical models (Bayesian networks, Markov random fields) represent joint probability distributions over sets of random variables
Encode conditional independence assumptions using graph structures
Enable efficient inference and learning from data
Probability distributions (Gaussian, Bernoulli, Poisson) model the likelihood of different outcomes or values
Used for modeling data generating processes and making probabilistic predictions
Practice Problems and Examples
A fair die is rolled. What is the probability of getting an even number?
Sample space: Ω={1,2,3,4,5,6}
Event A: Getting an even number A={2,4,6}
P(A)=63=21
Two fair coins are tossed. What is the probability of getting at least one head?
Sample space: Ω={HH,HT,TH,TT}
Event A: Getting at least one head A={HH,HT,TH}
P(A)=43
A bag contains 4 red balls and 6 blue balls. If two balls are drawn at random without replacement, what is the probability that both balls are red?
Total balls: 10
P(1st red)=104
P(2nd red | 1st red)=93
P(both red)=P(1st red)×P(2nd red | 1st red)=104×93=152
A medical test has a 95% accuracy rate for detecting a disease when it is present and a 90% accuracy rate for correctly identifying the absence of the disease. If 1% of the population has the disease, what is the probability that a person has the disease given that they test positive?
Let D be the event that a person has the disease and T be the event that they test positive
P(D)=0.01, P(T∣D)=0.95, P(T∣Dc)=0.10
Using Bayes' Theorem: P(D∣T)=P(T∣D)P(D)+P(T∣Dc)P(Dc)P(T∣D)P(D)=0.95×0.01+0.10×0.990.95×0.01≈0.087