📈Theoretical Statistics Unit 3 – Expectation and moments

Expectation and moments are fundamental concepts in probability theory and statistics. They provide powerful tools for analyzing random variables and their distributions, allowing us to quantify average values, spread, and other important characteristics. From basic definitions to advanced applications, this topic covers a wide range of ideas. We'll explore probability foundations, random variables, moment generating functions, and their roles in statistical inference, giving you a solid understanding of these essential concepts.

Key Concepts and Definitions

  • Expectation represents the average value of a random variable over its entire range of possible outcomes
  • Moments measure different aspects of a probability distribution, such as central tendency, dispersion, and shape
  • First moment is the mean or expected value, denoted as E[X]\mathbb{E}[X] for a random variable XX
  • Second moment is the expected value of the squared random variable, E[X2]\mathbb{E}[X^2], related to the variance
    • Variance measures the spread of a distribution around its mean, defined as Var(X)=E[(XE[X])2]\text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2]
  • Higher moments (third, fourth, etc.) capture additional characteristics of a distribution, such as skewness and kurtosis
  • Moment generating functions (MGFs) are a tool for generating moments of a random variable through differentiation
  • MGFs uniquely characterize a probability distribution and can be used to derive its properties

Probability Foundations

  • Probability is a measure of the likelihood of an event occurring, ranging from 0 (impossible) to 1 (certain)
  • Sample space Ω\Omega is the set of all possible outcomes of a random experiment
  • Events are subsets of the sample space, and the probability of an event AA is denoted as P(A)P(A)
  • Probability axioms: non-negativity (P(A)0P(A) \geq 0), normalization (P(Ω)=1P(\Omega) = 1), and countable additivity (P(i=1Ai)=i=1P(Ai)P(\bigcup_{i=1}^{\infty} A_i) = \sum_{i=1}^{\infty} P(A_i) for disjoint events AiA_i)
  • Conditional probability P(AB)P(A|B) is the probability of event AA given that event BB has occurred, defined as P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)} when P(B)>0P(B) > 0
  • Independence of events: Two events AA and BB are independent if P(AB)=P(A)P(B)P(A \cap B) = P(A)P(B), meaning the occurrence of one does not affect the probability of the other

Random Variables and Distributions

  • A random variable is a function that assigns a numerical value to each outcome in a sample space
  • Discrete random variables have countable outcomes (integers), while continuous random variables have uncountable outcomes (real numbers)
  • Probability mass function (PMF) for a discrete random variable XX is denoted as pX(x)=P(X=x)p_X(x) = P(X = x), giving the probability of XX taking a specific value xx
  • Probability density function (PDF) for a continuous random variable XX is denoted as fX(x)f_X(x), satisfying P(aXb)=abfX(x)dxP(a \leq X \leq b) = \int_a^b f_X(x) dx
  • Cumulative distribution function (CDF) FX(x)=P(Xx)F_X(x) = P(X \leq x) gives the probability of a random variable being less than or equal to a given value xx
    • For discrete random variables, FX(x)=yxpX(y)F_X(x) = \sum_{y \leq x} p_X(y)
    • For continuous random variables, FX(x)=xfX(y)dyF_X(x) = \int_{-\infty}^x f_X(y) dy
  • Common discrete distributions include Bernoulli, Binomial, Poisson, and Geometric
  • Common continuous distributions include Uniform, Normal (Gaussian), Exponential, and Beta

Expectation: Basics and Properties

  • Expectation is a linear operator, meaning E[aX+bY]=aE[X]+bE[Y]\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y] for constants aa and bb and random variables XX and YY
  • For a discrete random variable XX with PMF pX(x)p_X(x), the expectation is calculated as E[X]=xxpX(x)\mathbb{E}[X] = \sum_x x \cdot p_X(x)
  • For a continuous random variable XX with PDF fX(x)f_X(x), the expectation is calculated as E[X]=xfX(x)dx\mathbb{E}[X] = \int_{-\infty}^{\infty} x \cdot f_X(x) dx
  • Law of the unconscious statistician (LOTUS): For a function g(X)g(X) of a random variable XX, E[g(X)]=xg(x)pX(x)\mathbb{E}[g(X)] = \sum_x g(x) \cdot p_X(x) (discrete case) or E[g(X)]=g(x)fX(x)dx\mathbb{E}[g(X)] = \int_{-\infty}^{\infty} g(x) \cdot f_X(x) dx (continuous case)
  • Expectation of a constant: E[c]=c\mathbb{E}[c] = c for any constant cc
  • Expectation of a sum: E[X+Y]=E[X]+E[Y]\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y] for random variables XX and YY
  • Expectation of a product: E[XY]=E[X]E[Y]\mathbb{E}[XY] = \mathbb{E}[X] \cdot \mathbb{E}[Y] for independent random variables XX and YY

Moments and Their Significance

  • Raw moments: The kk-th raw moment of a random variable XX is defined as E[Xk]\mathbb{E}[X^k]
    • First raw moment is the mean, E[X]\mathbb{E}[X]
    • Second raw moment is E[X2]\mathbb{E}[X^2], used to calculate variance
  • Central moments: The kk-th central moment of a random variable XX is defined as E[(XE[X])k]\mathbb{E}[(X - \mathbb{E}[X])^k]
    • First central moment is always 0
    • Second central moment is the variance, Var(X)=E[(XE[X])2]\text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2]
  • Standardized moments: The kk-th standardized moment of a random variable XX is defined as E[(XE[X]Var(X))k]\mathbb{E}[(\frac{X - \mathbb{E}[X]}{\sqrt{\text{Var}(X)}})^k]
    • Third standardized moment measures skewness, the asymmetry of a distribution
    • Fourth standardized moment measures kurtosis, the heaviness of the tails of a distribution
  • Moments can be used to characterize and compare different probability distributions
  • Higher moments provide additional information about the shape and properties of a distribution

Moment Generating Functions

  • The moment generating function (MGF) of a random variable XX is defined as MX(t)=E[etX]M_X(t) = \mathbb{E}[e^{tX}], where tt is a real number
  • MGFs uniquely determine a probability distribution, meaning two random variables with the same MGF have the same distribution
  • The kk-th moment of XX can be found by differentiating the MGF kk times and evaluating at t=0t=0: E[Xk]=MX(k)(0)\mathbb{E}[X^k] = M_X^{(k)}(0)
  • MGFs can be used to derive the mean, variance, and other properties of a distribution
  • For independent random variables XX and YY, the MGF of their sum is the product of their individual MGFs: MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) \cdot M_Y(t)
  • MGFs can be used to prove various results in probability theory, such as the Central Limit Theorem

Applications in Statistical Inference

  • Moments and MGFs play a crucial role in parameter estimation and hypothesis testing
  • Method of moments estimators are obtained by equating sample moments to population moments and solving for the parameters
    • For example, the sample mean Xˉ\bar{X} is an estimator for the population mean μ\mu
  • Maximum likelihood estimation (MLE) is another common approach, which finds the parameter values that maximize the likelihood function
  • MGFs can be used to derive the sampling distributions of estimators and test statistics
  • Moments and MGFs are also used in Bayesian inference to specify prior and posterior distributions for parameters
  • Higher moments, such as skewness and kurtosis, can be used to assess the normality assumption in various statistical tests

Advanced Topics and Extensions

  • Multivariate moments and MGFs extend the concepts to random vectors and joint distributions
  • Conditional expectation E[XY]\mathbb{E}[X|Y] is the expected value of XX given the value of another random variable YY
  • Moment inequalities, such as Markov's inequality and Chebyshev's inequality, provide bounds on the probability of a random variable deviating from its mean
  • Characteristic functions, defined as ϕX(t)=E[eitX]\phi_X(t) = \mathbb{E}[e^{itX}] for real tt, are another tool for uniquely characterizing distributions
  • Cumulants are an alternative to moments, with the kk-th cumulant defined as the kk-th derivative of the logarithm of the MGF evaluated at 0
  • Empirical moments and MGFs can be used to estimate population moments and MGFs from sample data
  • Robust moments, such as trimmed means and winsorized means, are less sensitive to outliers and heavy-tailed distributions


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.