📊Bayesian Statistics Unit 1 – Probability Theory Foundations

Probability theory forms the foundation of Bayesian statistics. It provides tools to measure and analyze uncertainty, from basic concepts like sample spaces and events to more complex ideas like probability distributions and conditional probabilities. Key concepts include probability axioms, random variables, and distributions. These are essential for understanding Bayesian inference, which uses prior knowledge and observed data to update probabilities and make informed decisions in various fields.

Study Guides for Unit 1 – Probability Theory Foundations

1.1

Probability axioms

1.2

Random variables

1.3

Probability distributions

1.4

Expectation and variance

1.5

Joint and conditional probabilities

1.6

Law of total probability

1.7

Independence

Key Concepts and Terminology

Probability measures the likelihood of an event occurring ranges from 0 (impossible) to 1 (certain)
Sample space ($\Omega$) set of all possible outcomes of a random experiment
Event (A) subset of the sample space represents a specific outcome or set of outcomes
Probability density function (PDF) describes the probability distribution of a continuous random variable
- Integrating the PDF over a specific range yields the probability of the random variable falling within that range
Probability mass function (PMF) describes the probability distribution of a discrete random variable
- Summing the PMF over all possible values equals 1
Cumulative distribution function (CDF) gives the probability that a random variable takes a value less than or equal to a given value
Independence two events are independent if the occurrence of one does not affect the probability of the other

Probability Axioms and Rules

Axiom 1 (Non-negativity) probability of any event A is greater than or equal to 0 ($P(A) \geq 0$)
Axiom 2 (Normalization) probability of the entire sample space is equal to 1 ($P(\Omega) = 1$)
Axiom 3 (Additivity) if A and B are mutually exclusive events, then $P(A \cup B) = P(A) + P(B)$
Complement Rule probability of an event A not occurring is 1 minus the probability of A occurring ($P(A^c) = 1 - P(A)$)
Multiplication Rule for independent events A and B, $P(A \cap B) = P(A) \times P(B)$
- For dependent events, $P(A \cap B) = P(A) \times P(B|A)$, where $P(B|A)$ is the conditional probability of B given A
Addition Rule for non-mutually exclusive events A and B, $P(A \cup B) = P(A) + P(B) - P(A \cap B)$
Law of Total Probability if $B_1, B_2, ..., B_n$ form a partition of the sample space, then for any event A, $P(A) = \sum_{i=1}^n P(A \cap B_i) = \sum_{i=1}^n P(A|B_i)P(B_i)$

Random Variables and Distributions

Random variable (X) function that assigns a numerical value to each outcome in a sample space
Discrete random variable can take on a countable number of distinct values (number of defective items in a batch)
Continuous random variable can take on any value within a specified range or interval (height of a randomly selected person)
Bernoulli distribution models a single trial with two possible outcomes (success or failure) with probability of success p
Binomial distribution models the number of successes in a fixed number of independent Bernoulli trials with probability of success p
Poisson distribution models the number of events occurring in a fixed interval of time or space, given the average rate of occurrence
Normal (Gaussian) distribution continuous probability distribution characterized by its mean ($\mu$) and standard deviation ($\sigma$)
- 68-95-99.7 rule approximately 68%, 95%, and 99.7% of the data falls within 1, 2, and 3 standard deviations of the mean, respectively
Exponential distribution models the time between events in a Poisson process, with a constant average rate of occurrence

Conditional Probability and Bayes' Theorem

Conditional probability $P(A|B)$ probability of event A occurring given that event B has occurred
- Calculated as $P(A|B) = \frac{P(A \cap B)}{P(B)}$, where $P(B) > 0$
Bayes' Theorem relates conditional probabilities $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$
- Useful for updating probabilities based on new information or evidence
Prior probability initial probability of an event before considering any additional information (prevalence of a disease in a population)
Likelihood probability of observing the data given a specific hypothesis (probability of a positive test result given that a person has the disease)
Posterior probability updated probability of an event after considering new information (probability of having the disease given a positive test result)
Bayes' Theorem in terms of prior, likelihood, and evidence $P(H|E) = \frac{P(E|H)P(H)}{P(E)}$, where H is the hypothesis and E is the evidence

Expectation and Variance

Expectation (mean) of a discrete random variable X $E[X] = \sum_{x} x \cdot P(X=x)$
- For a continuous random variable, replace the sum with an integral
Expectation is a linear operator for constants a and b and random variables X and Y, $E[aX + bY] = aE[X] + bE[Y]$
Variance measures the spread of a random variable X around its mean $Var(X) = E[(X - E[X])^2]$
- Can also be calculated as $Var(X) = E[X^2] - (E[X])^2$
Standard deviation square root of the variance $\sigma_X = \sqrt{Var(X)}$
Covariance measures the linear relationship between two random variables X and Y $Cov(X, Y) = E[(X - E[X])(Y - E[Y])]$
Correlation normalized version of covariance ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation) $Corr(X, Y) = \frac{Cov(X, Y)}{\sigma_X \sigma_Y}$

Joint and Marginal Distributions

Joint probability distribution $P(X, Y)$ describes the probability of two random variables X and Y taking on specific values simultaneously
- For discrete random variables, joint PMF $P(X=x, Y=y)$ gives the probability of X=x and Y=y occurring together
- For continuous random variables, joint PDF $f(x, y)$ describes the probability density at (x, y)
Marginal probability distribution probability distribution of a single random variable, ignoring the others
- For discrete random variables, marginal PMF $P(X=x) = \sum_y P(X=x, Y=y)$
- For continuous random variables, marginal PDF $f_X(x) = \int_{-\infty}^{\infty} f(x, y) dy$
Conditional probability distribution probability distribution of one random variable given the value of another
- For discrete random variables, conditional PMF $P(Y=y|X=x) = \frac{P(X=x, Y=y)}{P(X=x)}$
- For continuous random variables, conditional PDF $f_{Y|X}(y|x) = \frac{f(x, y)}{f_X(x)}$
Independence for random variables X and Y are independent if and only if their joint probability distribution is the product of their marginal distributions $P(X, Y) = P(X)P(Y)$ or $f(x, y) = f_X(x)f_Y(y)$

Probability in Bayesian Context

Bayesian inference updating beliefs or probabilities based on new data or evidence
- Combines prior knowledge with observed data to obtain a posterior distribution
Prior distribution $P(\theta)$ represents the initial beliefs about a parameter $\theta$ before observing any data
- Can be informative (based on previous studies or expert knowledge) or non-informative (uniform or vague priors)
Likelihood function $P(D|\theta)$ probability of observing the data D given the parameter $\theta$
- Describes how likely the observed data is for different values of $\theta$
Posterior distribution $P(\theta|D)$ updated beliefs about the parameter $\theta$ after observing the data D
- Obtained by combining the prior and likelihood using Bayes' Theorem $P(\theta|D) = \frac{P(D|\theta)P(\theta)}{P(D)}$
Marginal likelihood (evidence) $P(D)$ probability of observing the data D, averaged over all possible values of $\theta$
- Acts as a normalizing constant in Bayes' Theorem $P(D) = \int P(D|\theta)P(\theta) d\theta$
Bayesian model comparison selecting among competing models based on their posterior probabilities
- Bayes factor $BF_{12} = \frac{P(D|M_1)}{P(D|M_2)}$ compares the evidence for two models $M_1$ and $M_2$

Common Applications and Examples

Bayesian A/B testing comparing two versions of a website or app to determine which performs better
- Prior distribution represents initial beliefs about the conversion rates
- Likelihood function based on the observed number of conversions and visitors for each version
- Posterior distribution updates the beliefs about the conversion rates after observing the data
Bayesian parameter estimation inferring the values of model parameters from observed data
- Prior distribution represents initial beliefs about the parameters (mean and standard deviation of a normal distribution)
- Likelihood function based on the observed data points
- Posterior distribution provides updated estimates of the parameters
Bayesian classification assigning an object to one of several classes based on its features
- Prior distribution represents the initial probabilities of each class
- Likelihood function describes the probability of observing the features given each class
- Posterior distribution gives the updated probabilities of each class after observing the features
Bayesian regression fitting a linear or nonlinear model to observed data points
- Prior distribution represents initial beliefs about the regression coefficients
- Likelihood function based on the observed data points and the assumed noise distribution
- Posterior distribution provides updated estimates of the regression coefficients
Bayesian networks graphical models representing the probabilistic relationships among a set of variables
- Nodes represent variables, and edges represent conditional dependencies
- Joint probability distribution factorizes according to the graph structure
- Inference and learning algorithms used to update probabilities and learn the structure from data

📊Bayesian Statistics Unit 1 – Probability Theory Foundations

Study Guides for Unit 1 – Probability Theory Foundations

Key Concepts and Terminology

Probability Axioms and Rules

Random Variables and Distributions

Conditional Probability and Bayes' Theorem

Expectation and Variance

Joint and Marginal Distributions

Probability in Bayesian Context

Common Applications and Examples

1.1 Probability axioms

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes