๐ฃStatistical Inference Unit 13 โ Asymptotic Theory & Large Sample Inference
Asymptotic theory explores how statistical estimators behave as sample sizes approach infinity. It's crucial for understanding the reliability and efficiency of statistical methods in large datasets, providing a foundation for hypothesis testing and confidence interval construction.
Key concepts include consistency, efficiency, and asymptotic normality of estimators. These principles allow researchers to make inferences about population parameters using large samples, even when the exact distribution of an estimator is unknown or complex to derive.
Study Guides for Unit 13 โ Asymptotic Theory & Large Sample Inference
Asymptotic theory studies the behavior of estimators and statistical procedures as the sample size approaches infinity
Consistency of an estimator means that as the sample size increases, the estimator converges in probability to the true parameter value
Efficiency of an estimator refers to its variance relative to other estimators, with more efficient estimators having smaller variances
An estimator is asymptotically efficient if its variance achieves the Cramรฉr-Rao lower bound as the sample size tends to infinity
Asymptotic normality implies that the distribution of an estimator, properly standardized, converges to a standard normal distribution as the sample size increases
Asymptotic unbiasedness indicates that the bias of an estimator tends to zero as the sample size grows large
Asymptotic equivalence of two sequences of random variables means that their difference converges in probability to zero as the sample size increases
Asymptotic relative efficiency (ARE) compares the efficiency of two estimators in the limit, calculated as the ratio of their asymptotic variances
Foundations of Asymptotic Theory
Asymptotic theory relies on the concept of limits and convergence of sequences of random variables
Convergence in probability means that for any $\epsilon > 0$, $P(|X_n - X| > \epsilon) \to 0$ as $n \to \infty$
This is a weak form of convergence, as it only requires the probability of large deviations to vanish asymptotically
Almost sure convergence (or convergence with probability 1) is a stronger form of convergence, implying that $P(\lim_{n \to \infty} X_n = X) = 1$
Convergence in distribution (or weak convergence) means that the cumulative distribution function (CDF) of $X_n$ converges to the CDF of $X$ at all continuity points of the latter
This is denoted as $X_n \xrightarrow{d} X$
Convergence in quadratic mean (or $L^2$ convergence) requires that $E[(X_n - X)^2] \to 0$ as $n \to \infty$, which implies convergence in probability
Slutsky's theorem allows for the manipulation of sequences of random variables that converge in probability or distribution
For example, if $X_n \xrightarrow{p} a$ and $Y_n \xrightarrow{d} Y$, then $X_nY_n \xrightarrow{d} aY$
Convergence Types and Properties
Convergence in probability is closed under continuous transformations, meaning that if $X_n \xrightarrow{p} X$ and $g$ is a continuous function, then $g(X_n) \xrightarrow{p} g(X)$
Convergence in distribution is closed under continuous transformations, i.e., if $X_n \xrightarrow{d} X$ and $g$ is a continuous function, then $g(X_n) \xrightarrow{d} g(X)$
The continuous mapping theorem generalizes the previous properties, stating that if $X_n \xrightarrow{d} X$ and $g$ is a continuous function, then $g(X_n) \xrightarrow{d} g(X)$
The Mann-Wald theorem (or the converging together lemma) states that if $X_n \xrightarrow{p} X$ and $Y_n \xrightarrow{p} X$, then $X_n - Y_n \xrightarrow{p} 0$
This is useful for proving the asymptotic equivalence of two estimators
The delta method approximates the distribution of a transformed random variable using a Taylor series expansion
If $\sqrt{n}(X_n - \mu) \xrightarrow{d} N(0, \sigma^2)$ and $g$ is a differentiable function, then $\sqrt{n}(g(X_n) - g(\mu)) \xrightarrow{d} N(0, \sigma^2[g'(\mu)]^2)$
The Cramรฉr-Wold device is a theorem that relates the joint convergence in distribution of random vectors to the convergence of linear combinations of their components
Central Limit Theorem and Its Applications
The central limit theorem (CLT) states that the sum of a large number of independent and identically distributed (i.i.d.) random variables with finite mean and variance converges in distribution to a normal distribution
Formally, if $X_1, X_2, \ldots, X_n$ are i.i.d. with mean $\mu$ and variance $\sigma^2$, then $\frac{\sum_{i=1}^n X_i - n\mu}{\sqrt{n}\sigma} \xrightarrow{d} N(0, 1)$
The CLT holds under more general conditions, such as for independent but not identically distributed random variables with finite variances (Lindeberg-Feller CLT)
The CLT is the foundation for many statistical procedures, as it justifies the use of normal approximations for the sampling distributions of estimators
The sample mean $\bar{X}$ is asymptotically normal under the conditions of the CLT, with $\sqrt{n}(\bar{X} - \mu) \xrightarrow{d} N(0, \sigma^2)$
The sample variance $S^2$ is also asymptotically normal, with $\sqrt{n}(S^2 - \sigma^2) \xrightarrow{d} N(0, \mu_4 - \sigma^4)$, where $\mu_4$ is the fourth central moment of the population
The CLT can be used to construct confidence intervals and hypothesis tests for population parameters based on large samples
For example, an approximate 95% confidence interval for the population mean is $\bar{X} \pm 1.96\frac{S}{\sqrt{n}}$
Asymptotic Distributions of Estimators
The asymptotic distribution of an estimator characterizes its behavior as the sample size tends to infinity
Maximum likelihood estimators (MLEs) are asymptotically normal under regularity conditions, with $\sqrt{n}(\hat{\theta}_n - \theta) \xrightarrow{d} N(0, I^{-1}(\theta))$, where $I(\theta)$ is the Fisher information
This result is known as the asymptotic normality of MLEs
The asymptotic variance of an MLE achieves the Cramรฉr-Rao lower bound, making MLEs asymptotically efficient
Method of moments estimators are also asymptotically normal under certain conditions, with their asymptotic variance depending on the moments of the population
The asymptotic distribution of the sample quantiles is related to the quantile function of the population and its density at the quantile of interest
The asymptotic distribution of the sample correlation coefficient is normal, with variance depending on the population correlation and the fourth moments of the joint distribution
Asymptotically pivotal quantities, such as studentized statistics, have asymptotic distributions that do not depend on unknown parameters
These are useful for constructing confidence intervals and tests in large samples
Large Sample Hypothesis Testing
Hypothesis tests based on large sample theory rely on the asymptotic distributions of test statistics under the null hypothesis
The Wald test is based on the asymptotic normality of MLEs, with the test statistic $W = \frac{(\hat{\theta}_n - \theta_0)^2}{I^{-1}(\hat{\theta}_n)/n}$ asymptotically following a chi-square distribution with 1 degree of freedom under the null hypothesis
The likelihood ratio test (LRT) compares the maximized likelihoods under the null and alternative hypotheses, with the test statistic $-2\log(\Lambda_n)$ asymptotically following a chi-square distribution with degrees of freedom equal to the difference in the number of parameters
The score test (or Lagrange multiplier test) is based on the gradient of the log-likelihood at the null hypothesis parameter value, with the test statistic asymptotically following a chi-square distribution under the null
Rao's efficient score test is an asymptotically equivalent version of the score test that uses the Fisher information matrix to standardize the score function
Large sample tests for proportions, such as the z-test and the chi-square test for goodness of fit, rely on the asymptotic normality of the sample proportion and the asymptotic chi-square distribution of the Pearson statistic, respectively
Confidence Intervals in Large Samples
Confidence intervals based on large sample theory utilize the asymptotic distributions of estimators to construct intervals with a desired coverage probability
The Wald confidence interval for a parameter $\theta$ is based on the asymptotic normality of the MLE, with the interval given by $\hat{\theta}n \pm z{\alpha/2}\sqrt{I^{-1}(\hat{\theta}n)/n}$, where $z{\alpha/2}$ is the $(1-\alpha/2)$ quantile of the standard normal distribution
The likelihood ratio confidence interval is constructed by inverting the likelihood ratio test, i.e., finding the set of parameter values for which the LRT fails to reject the null hypothesis at a given significance level
The score confidence interval is obtained by inverting the score test, i.e., finding the set of parameter values for which the score statistic falls within the acceptance region of the test
Large sample confidence intervals for proportions can be constructed using the normal approximation to the binomial distribution, with the interval given by $\hat{p} \pm z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}$
The delta method can be used to construct confidence intervals for transformed parameters, such as the ratio of two means or the difference of two proportions
The interval is based on the asymptotic normality of the transformed estimator, with the variance obtained using the delta method
Practical Applications and Examples
Large sample theory is widely used in various fields, such as economics, finance, social sciences, and medical research, where sample sizes are often large
In clinical trials, the asymptotic normality of the sample mean is used to compare the effectiveness of treatments, with confidence intervals and hypothesis tests based on the normal approximation
For example, a z-test can be used to compare the mean blood pressure reduction between a treatment and a placebo group
In survey sampling, the CLT justifies the use of normal approximations for the sampling distribution of the sample mean or proportion, allowing for the construction of confidence intervals and hypothesis tests
For instance, a large sample confidence interval can be used to estimate the proportion of voters supporting a particular candidate
In finance, the asymptotic properties of estimators are used to analyze the performance of asset pricing models and to test market efficiency
The Fama-MacBeth regression, which relies on the asymptotic normality of the average estimated coefficients, is a common approach to test asset pricing models
In econometrics, large sample theory is the foundation for the asymptotic properties of ordinary least squares (OLS) and other estimation methods, as well as for the construction of hypothesis tests and confidence intervals
The asymptotic normality of the OLS estimator is used to test the significance of regression coefficients and to construct confidence intervals for the marginal effects of predictors
In machine learning, the asymptotic properties of estimators are relevant for understanding the behavior of learning algorithms as the sample size grows large
For example, the consistency and asymptotic normality of the k-nearest neighbors classifier can be studied using large sample theory