Light

📈Theoretical Statistics Unit 9 – Bayesian statistics

Bayesian statistics offers a powerful framework for updating beliefs based on new evidence. It combines prior knowledge with observed data to make inferences about parameters and hypotheses. This approach contrasts with frequentist methods, providing a flexible way to handle uncertainty. Key concepts include Bayes' theorem, prior and posterior distributions, and likelihood functions. Computational methods like MCMC enable practical implementation of Bayesian analysis. Understanding these principles equips statisticians to tackle complex problems and make data-driven decisions.

Study Guides for Unit 9

9.1

Bayesian inference

9 min read

9.2

Prior and posterior distributions

7 min read

9.3

Conjugate priors

8 min read

9.4

Bayesian estimation

9 min read

9.5

Bayesian hypothesis testing

8 min read

Foundations of Probability

Probability quantifies the likelihood of an event occurring and ranges from 0 (impossible) to 1 (certain)
Joint probability $P(A,B)$ $P (A, B)$ represents the probability of events A and B occurring simultaneously
- Calculated by multiplying the individual probabilities of A and B if they are independent events
Conditional probability $P(A|B)$ $P (A ∣ B)$ measures the probability of event A occurring given that event B has already occurred
- Calculated using the formula $P(A|B) = \frac{P(A,B)}{P(B)}$
Marginal probability $P(A)$ $P (A)$ represents the probability of event A occurring, regardless of the outcome of other events
- Obtained by summing the joint probabilities of A with all possible outcomes of the other event(s)
Independence of events occurs when the occurrence of one event does not affect the probability of another event
- Mathematically, $P(A|B) = P(A)$ and $P(B|A) = P(B)$ for independent events A and B
Random variables assign numerical values to the outcomes of a random experiment
- Discrete random variables have countable outcomes (integers)
- Continuous random variables have uncountable outcomes (real numbers)

Introduction to Bayesian Thinking

Bayesian thinking involves updating beliefs (probabilities) about an event or hypothesis based on new evidence or data
Prior probability represents the initial belief about an event or hypothesis before considering new evidence
Likelihood quantifies the probability of observing the data given a specific hypothesis or parameter value
Posterior probability represents the updated belief about an event or hypothesis after considering new evidence
- Combines the prior probability and the likelihood using Bayes' theorem
Bayesian inference draws conclusions about parameters or hypotheses based on the posterior distribution
Bayesian thinking allows for the incorporation of prior knowledge and the updating of beliefs as new data becomes available
Bayesian methods are particularly useful when dealing with limited data or when prior information is available

Bayes' Theorem and Its Components

Bayes' theorem is a fundamental rule in Bayesian statistics that relates conditional probabilities
- Mathematically, $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$
The components of Bayes' theorem include:
- Prior probability $P(A)$ : the initial belief about event A before considering evidence B
- Likelihood $P(B|A)$ : the probability of observing evidence B given that event A is true
- Marginal likelihood $P(B)$ : the probability of observing evidence B, regardless of the truth of event A
- Posterior probability $P(A|B)$ : the updated belief about event A after considering evidence B
Bayes' theorem allows for the updating of beliefs by combining prior knowledge with new evidence
The denominator $P(B)$ acts as a normalizing constant to ensure the posterior probabilities sum to 1
Bayes' theorem is the foundation for Bayesian inference and parameter estimation

Prior Distributions: Types and Selection

Prior distributions represent the initial beliefs about parameters or hypotheses before considering data
Informative priors incorporate prior knowledge or expert opinion about the parameters
- Conjugate priors result in posterior distributions that belong to the same family as the prior (mathematically convenient)
Non-informative priors aim to minimize the influence of prior beliefs on the posterior distribution
- Uniform prior assigns equal probability to all possible parameter values
- Jeffreys prior is proportional to the square root of the Fisher information matrix
Improper priors are not valid probability distributions but can still lead to proper posterior distributions
Prior selection should be based on available prior knowledge, the nature of the problem, and the desired properties of the posterior distribution
Sensitivity analysis can be performed to assess the impact of different prior choices on the posterior inference

Likelihood Functions and Their Role

Likelihood functions quantify the probability of observing the data given specific parameter values
The likelihood function is a key component in Bayesian inference and is combined with the prior distribution to obtain the posterior distribution
For discrete data, the likelihood is the probability mass function evaluated at the observed data points
For continuous data, the likelihood is the probability density function evaluated at the observed data points
Maximum likelihood estimation (MLE) finds the parameter values that maximize the likelihood function
- MLE provides a point estimate of the parameters but does not incorporate prior information
The likelihood function is not a probability distribution over the parameters but rather a function of the parameters given the observed data
The shape of the likelihood function provides information about the precision and uncertainty of the parameter estimates

Posterior Distributions and Inference

Posterior distributions represent the updated beliefs about parameters or hypotheses after considering the data
The posterior distribution is obtained by combining the prior distribution and the likelihood function using Bayes' theorem
- Mathematically, $P(\theta|D) \propto P(D|\theta)P(\theta)$ , where $\theta$ represents the parameters and $D$ represents the data
Posterior inference involves summarizing and interpreting the posterior distribution
- Point estimates: mean, median, or mode of the posterior distribution
- Interval estimates: credible intervals (Bayesian confidence intervals) that contain a specified probability mass
Posterior predictive distributions allow for making predictions about future observations based on the posterior distribution of the parameters
Bayesian model selection compares different models based on their posterior probabilities or Bayes factors
Bayesian decision theory combines the posterior distribution with a loss function to make optimal decisions under uncertainty

Bayesian vs. Frequentist Approaches

Bayesian and frequentist approaches differ in their philosophical interpretation of probability and their treatment of parameters
Bayesian approach:
- Probability represents a degree of belief or uncertainty about an event or hypothesis
- Parameters are treated as random variables with associated probability distributions (priors and posteriors)
- Inference is based on the posterior distribution, which combines prior knowledge with observed data
Frequentist approach:
- Probability represents the long-run frequency of an event in repeated experiments
- Parameters are treated as fixed, unknown quantities
- Inference is based on the sampling distribution of estimators and the construction of confidence intervals and hypothesis tests
Bayesian methods allow for the incorporation of prior knowledge and provide a natural framework for updating beliefs as new data becomes available
Frequentist methods focus on the properties of estimators and the control of long-run error rates
Bayesian and frequentist approaches can lead to different results, especially when dealing with small sample sizes or informative priors

Computational Methods in Bayesian Analysis

Bayesian inference often involves complex posterior distributions that cannot be analytically derived
Computational methods are used to approximate and sample from the posterior distribution
Markov Chain Monte Carlo (MCMC) methods are widely used in Bayesian analysis
- MCMC generates a Markov chain that converges to the posterior distribution
- Metropolis-Hastings algorithm is a general MCMC method that proposes new parameter values and accepts or rejects them based on a probability ratio
- Gibbs sampling is a special case of MCMC that samples from the full conditional distributions of the parameters
Variational inference is an alternative to MCMC that approximates the posterior distribution with a simpler, tractable distribution
- Minimizes the Kullback-Leibler divergence between the approximate and true posterior distributions
Laplace approximation approximates the posterior distribution with a Gaussian distribution centered at the mode of the posterior
Importance sampling and particle filtering are used for sequential Bayesian inference in dynamic models
Software packages (JAGS, Stan, PyMC3) and probabilistic programming languages simplify the implementation of Bayesian models and computational methods