๐Bayesian Statistics Unit 1 โ Probability Theory Foundations
Probability theory forms the foundation of Bayesian statistics. It provides tools to measure and analyze uncertainty, from basic concepts like sample spaces and events to more complex ideas like probability distributions and conditional probabilities.
Key concepts include probability axioms, random variables, and distributions. These are essential for understanding Bayesian inference, which uses prior knowledge and observed data to update probabilities and make informed decisions in various fields.
Study Guides for Unit 1 โ Probability Theory Foundations
Probability measures the likelihood of an event occurring ranges from 0 (impossible) to 1 (certain)
Sample space ($\Omega$) set of all possible outcomes of a random experiment
Event (A) subset of the sample space represents a specific outcome or set of outcomes
Probability density function (PDF) describes the probability distribution of a continuous random variable
Integrating the PDF over a specific range yields the probability of the random variable falling within that range
Probability mass function (PMF) describes the probability distribution of a discrete random variable
Summing the PMF over all possible values equals 1
Cumulative distribution function (CDF) gives the probability that a random variable takes a value less than or equal to a given value
Independence two events are independent if the occurrence of one does not affect the probability of the other
Probability Axioms and Rules
Axiom 1 (Non-negativity) probability of any event A is greater than or equal to 0 ($P(A) \geq 0$)
Axiom 2 (Normalization) probability of the entire sample space is equal to 1 ($P(\Omega) = 1$)
Axiom 3 (Additivity) if A and B are mutually exclusive events, then $P(A \cup B) = P(A) + P(B)$
Complement Rule probability of an event A not occurring is 1 minus the probability of A occurring ($P(A^c) = 1 - P(A)$)
Multiplication Rule for independent events A and B, $P(A \cap B) = P(A) \times P(B)$
For dependent events, $P(A \cap B) = P(A) \times P(B|A)$, where $P(B|A)$ is the conditional probability of B given A
Addition Rule for non-mutually exclusive events A and B, $P(A \cup B) = P(A) + P(B) - P(A \cap B)$
Law of Total Probability if $B_1, B_2, ..., B_n$ form a partition of the sample space, then for any event A, $P(A) = \sum_{i=1}^n P(A \cap B_i) = \sum_{i=1}^n P(A|B_i)P(B_i)$
Random Variables and Distributions
Random variable (X) function that assigns a numerical value to each outcome in a sample space
Discrete random variable can take on a countable number of distinct values (number of defective items in a batch)
Continuous random variable can take on any value within a specified range or interval (height of a randomly selected person)
Bernoulli distribution models a single trial with two possible outcomes (success or failure) with probability of success p
Binomial distribution models the number of successes in a fixed number of independent Bernoulli trials with probability of success p
Poisson distribution models the number of events occurring in a fixed interval of time or space, given the average rate of occurrence
Normal (Gaussian) distribution continuous probability distribution characterized by its mean ($\mu$) and standard deviation ($\sigma$)
68-95-99.7 rule approximately 68%, 95%, and 99.7% of the data falls within 1, 2, and 3 standard deviations of the mean, respectively
Exponential distribution models the time between events in a Poisson process, with a constant average rate of occurrence
Conditional Probability and Bayes' Theorem
Conditional probability $P(A|B)$ probability of event A occurring given that event B has occurred
Calculated as $P(A|B) = \frac{P(A \cap B)}{P(B)}$, where $P(B) > 0$
Useful for updating probabilities based on new information or evidence
Prior probability initial probability of an event before considering any additional information (prevalence of a disease in a population)
Likelihood probability of observing the data given a specific hypothesis (probability of a positive test result given that a person has the disease)
Posterior probability updated probability of an event after considering new information (probability of having the disease given a positive test result)
Bayes' Theorem in terms of prior, likelihood, and evidence $P(H|E) = \frac{P(E|H)P(H)}{P(E)}$, where H is the hypothesis and E is the evidence
Expectation and Variance
Expectation (mean) of a discrete random variable X $E[X] = \sum_{x} x \cdot P(X=x)$
For a continuous random variable, replace the sum with an integral
Expectation is a linear operator for constants a and b and random variables X and Y, $E[aX + bY] = aE[X] + bE[Y]$
Variance measures the spread of a random variable X around its mean $Var(X) = E[(X - E[X])^2]$
Can also be calculated as $Var(X) = E[X^2] - (E[X])^2$
Standard deviation square root of the variance $\sigma_X = \sqrt{Var(X)}$
Covariance measures the linear relationship between two random variables X and Y $Cov(X, Y) = E[(X - E[X])(Y - E[Y])]$
Correlation normalized version of covariance ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation) $Corr(X, Y) = \frac{Cov(X, Y)}{\sigma_X \sigma_Y}$
Joint and Marginal Distributions
Joint probability distribution $P(X, Y)$ describes the probability of two random variables X and Y taking on specific values simultaneously
For discrete random variables, joint PMF $P(X=x, Y=y)$ gives the probability of X=x and Y=y occurring together
For continuous random variables, joint PDF $f(x, y)$ describes the probability density at (x, y)
Marginal probability distribution probability distribution of a single random variable, ignoring the others
For discrete random variables, marginal PMF $P(X=x) = \sum_y P(X=x, Y=y)$
For continuous random variables, marginal PDF $f_X(x) = \int_{-\infty}^{\infty} f(x, y) dy$
Conditional probability distribution probability distribution of one random variable given the value of another
For discrete random variables, conditional PMF $P(Y=y|X=x) = \frac{P(X=x, Y=y)}{P(X=x)}$
For continuous random variables, conditional PDF $f_{Y|X}(y|x) = \frac{f(x, y)}{f_X(x)}$
Independence for random variables X and Y are independent if and only if their joint probability distribution is the product of their marginal distributions $P(X, Y) = P(X)P(Y)$ or $f(x, y) = f_X(x)f_Y(y)$
Probability in Bayesian Context
Bayesian inference updating beliefs or probabilities based on new data or evidence
Combines prior knowledge with observed data to obtain a posterior distribution
Prior distribution $P(\theta)$ represents the initial beliefs about a parameter $\theta$ before observing any data
Can be informative (based on previous studies or expert knowledge) or non-informative (uniform or vague priors)
Likelihood function $P(D|\theta)$ probability of observing the data D given the parameter $\theta$
Describes how likely the observed data is for different values of $\theta$
Posterior distribution $P(\theta|D)$ updated beliefs about the parameter $\theta$ after observing the data D
Obtained by combining the prior and likelihood using Bayes' Theorem $P(\theta|D) = \frac{P(D|\theta)P(\theta)}{P(D)}$
Marginal likelihood (evidence) $P(D)$ probability of observing the data D, averaged over all possible values of $\theta$
Acts as a normalizing constant in Bayes' Theorem $P(D) = \int P(D|\theta)P(\theta) d\theta$
Bayesian model comparison selecting among competing models based on their posterior probabilities
Bayes factor $BF_{12} = \frac{P(D|M_1)}{P(D|M_2)}$ compares the evidence for two models $M_1$ and $M_2$
Common Applications and Examples
Bayesian A/B testing comparing two versions of a website or app to determine which performs better
Prior distribution represents initial beliefs about the conversion rates
Likelihood function based on the observed number of conversions and visitors for each version
Posterior distribution updates the beliefs about the conversion rates after observing the data
Bayesian parameter estimation inferring the values of model parameters from observed data
Prior distribution represents initial beliefs about the parameters (mean and standard deviation of a normal distribution)
Likelihood function based on the observed data points
Posterior distribution provides updated estimates of the parameters
Bayesian classification assigning an object to one of several classes based on its features
Prior distribution represents the initial probabilities of each class
Likelihood function describes the probability of observing the features given each class
Posterior distribution gives the updated probabilities of each class after observing the features
Bayesian regression fitting a linear or nonlinear model to observed data points
Prior distribution represents initial beliefs about the regression coefficients
Likelihood function based on the observed data points and the assumed noise distribution
Posterior distribution provides updated estimates of the regression coefficients
Bayesian networks graphical models representing the probabilistic relationships among a set of variables
Nodes represent variables, and edges represent conditional dependencies
Joint probability distribution factorizes according to the graph structure
Inference and learning algorithms used to update probabilities and learn the structure from data