💰Intro to Mathematical Economics Unit 10 – Econometric Models & Statistical Inference
Econometric models and statistical inference form the backbone of empirical economic analysis. These tools combine economic theory, mathematics, and statistics to test hypotheses and quantify relationships between variables, allowing researchers to draw meaningful conclusions from data.
From simple linear regression to complex panel data models, econometrics offers a diverse toolkit for analyzing economic phenomena. Understanding key concepts like OLS estimation, hypothesis testing, and model specification is crucial for interpreting results and avoiding common pitfalls in empirical research.
Econometrics combines economic theory, mathematics, and statistical inference to analyze economic phenomena and test hypotheses
Dependent variable (Y) represents the outcome or effect being studied, while independent variables (X) are the factors believed to influence the dependent variable
Stochastic error term (ε) captures the unexplained variation in the dependent variable not accounted for by the independent variables
Ordinary Least Squares (OLS) is a common estimation method that minimizes the sum of squared residuals to find the best-fitting line
Coefficient estimates (β) quantify the relationship between each independent variable and the dependent variable, holding other factors constant (ceteris paribus)
Interpretation depends on the functional form of the model (linear, log-linear, log-log)
Statistical significance indicates the likelihood that the observed relationship between variables is not due to chance (p-value)
R-squared (R2) measures the proportion of variation in the dependent variable explained by the independent variables (goodness of fit)
Types of Econometric Models
Simple linear regression models the relationship between one dependent variable and one independent variable (Y=β0+β1X+ε)
Multiple linear regression extends simple regression to include multiple independent variables (Y=β0+β1X1+β2X2+...+βkXk+ε)
Allows for controlling for confounding factors and isolating the effect of each independent variable
Logarithmic transformations (log-linear, log-log models) can be used to model non-linear relationships and interpret coefficients as elasticities
Panel data models (fixed effects, random effects) analyze data with both cross-sectional and time-series dimensions (individuals observed over time)
Instrumental variables (IV) estimation addresses endogeneity issues by using an instrument correlated with the independent variable but not the error term
Time series models (autoregressive, moving average, ARIMA) analyze data collected over regular time intervals and account for temporal dependence
Limited dependent variable models (probit, logit) are used when the dependent variable is binary or categorical
Statistical Foundations
Gauss-Markov assumptions ensure OLS estimators are unbiased and efficient (linearity, exogeneity, homoscedasticity, no multicollinearity, normality)
Sampling distributions describe the probability distribution of a sample statistic (mean, variance) over repeated samples
Central Limit Theorem states that the sampling distribution of the mean approaches a normal distribution as sample size increases, regardless of the population distribution
Standard errors measure the variability of coefficient estimates and are used to construct confidence intervals and test hypotheses
Confidence intervals provide a range of plausible values for a population parameter based on the sample estimate and desired level of confidence (90%, 95%, 99%)
Type I error (false positive) occurs when rejecting a true null hypothesis, while Type II error (false negative) occurs when failing to reject a false null hypothesis
Power of a test is the probability of correctly rejecting a false null hypothesis (1 - Type II error rate)
Model Specification and Estimation
Economic theory and prior research guide the selection of relevant variables and functional form
Omitted variable bias arises when a relevant variable is excluded from the model, leading to biased and inconsistent estimates
Misspecification tests (RESET, Hausman) can detect omitted variables, incorrect functional form, or endogeneity issues
Multicollinearity occurs when independent variables are highly correlated, leading to imprecise estimates and difficulty interpreting individual coefficients
Variance Inflation Factor (VIF) measures the degree of multicollinearity for each independent variable
Heteroscedasticity refers to non-constant variance of the error term, which can be detected using tests (Breusch-Pagan, White) and addressed through robust standard errors or weighted least squares
Autocorrelation in time series data can be detected using tests (Durbin-Watson) and addressed through generalized least squares or autoregressive models
Maximum Likelihood Estimation (MLE) is an alternative to OLS that estimates parameters by maximizing the likelihood function, often used in non-linear models
Hypothesis Testing and Inference
Null hypothesis (H0) represents the default position of no effect or no difference, while the alternative hypothesis (Ha) represents the research claim
Test statistic (t-statistic, F-statistic) measures the deviation of the sample estimate from the null hypothesis value, standardized by the standard error
P-value is the probability of observing a test statistic as extreme as the one calculated, assuming the null hypothesis is true
Smaller p-values provide stronger evidence against the null hypothesis
Significance level (α) is the threshold for rejecting the null hypothesis, typically set at 0.01, 0.05, or 0.10
One-tailed tests are used when the alternative hypothesis specifies a direction (greater than or less than), while two-tailed tests are used when the alternative is non-directional (not equal to)
Joint hypothesis tests (F-tests) evaluate the significance of multiple coefficients simultaneously
Wald tests compare the unrestricted and restricted models to test hypotheses about subsets of coefficients
Likelihood ratio tests compare the likelihood of the data under the null and alternative models
Interpreting Results
Coefficient estimates represent the change in the dependent variable associated with a one-unit change in the independent variable, holding other factors constant
For log-transformed variables, coefficients can be interpreted as elasticities or percentage changes
Statistical significance indicates the reliability of the estimated relationship, but does not necessarily imply economic or practical significance
Confidence intervals provide a range of plausible values for the population parameter, with narrower intervals indicating greater precision
Marginal effects measure the change in the dependent variable for a small change in an independent variable, holding other factors at their means
Standardized coefficients (beta coefficients) allow for comparing the relative importance of independent variables measured on different scales
Goodness of fit measures (R2, adjusted R2) indicate the proportion of variation in the dependent variable explained by the model, but do not guarantee causality or model validity
Out-of-sample predictions can assess the model's performance on new data and guard against overfitting
Common Pitfalls and Limitations
Endogeneity arises when an independent variable is correlated with the error term, leading to biased and inconsistent estimates
Sources include omitted variables, measurement error, and simultaneity (reverse causality)
Sample selection bias occurs when the sample is not representative of the population of interest, often due to non-random sampling or self-selection
Outliers and influential observations can disproportionately affect coefficient estimates and should be carefully examined and potentially addressed (robust regression, Cook's distance)
Ecological fallacy involves drawing conclusions about individuals based on aggregate data, which may not hold at the individual level (Simpson's paradox)
Extrapolation beyond the range of the data can lead to unreliable predictions, as the estimated relationships may not hold outside the sample
Causal inference requires careful research design and strong assumptions (randomization, exogeneity, exclusion restrictions) that may not be met in observational studies
Model uncertainty arises when multiple models fit the data equally well, and can be addressed through model averaging or Bayesian methods
Real-World Applications
Labor economics uses econometric models to study wage determination, labor supply and demand, and the effects of policies (minimum wage, unemployment insurance)
Environmental economics employs econometric techniques to estimate the value of non-market goods (air quality, biodiversity) and evaluate environmental policies (carbon taxes, cap-and-trade)
Health economics applies econometric methods to analyze healthcare demand, provider behavior, and the impact of interventions (insurance expansions, drug pricing)
Development economics uses econometrics to assess the effectiveness of poverty alleviation programs (conditional cash transfers, microfinance) and drivers of economic growth
Public economics relies on econometric analysis to study the effects of taxation, government spending, and redistribution on individual and firm behavior
Financial economics employs econometric models to evaluate asset pricing, risk management, and the impact of monetary policy on financial markets
Industrial organization uses econometrics to examine market structure, firm conduct, and the effects of mergers and antitrust policies on competition and consumer welfare