Econometrics applies statistical methods to economic data, helping researchers understand relationships between variables and make causal inferences. This field bridges economic theory and real-world observations, using techniques like regression analysis and instrumental variables estimation.

In this chapter, we explore key concepts in econometrics, including linear regression models, panel data analysis, and time series methods. We'll also discuss common challenges like endogeneity and heteroscedasticity, along with strategies to address them in empirical research.

Foundations of econometrics

Causal inference in econometrics

Top images from around the web for Causal inference in econometrics
Top images from around the web for Causal inference in econometrics
  • Causal inference aims to establish cause-and-effect relationships between variables
  • Relies on identifying exogenous variation in the explanatory variable of interest
  • Randomized controlled trials (RCTs) are the gold standard for causal inference
    • Randomly assign treatment and control groups to isolate the causal effect
  • Observational data can be used for causal inference under certain assumptions
    • Requires controlling for confounding factors and addressing selection bias

Econometric modeling process

  • Formulate an economic research question or hypothesis
  • Specify an econometric model based on economic theory and available data
  • Collect and prepare the relevant data for analysis
    • Clean and transform variables as needed
  • Estimate the model using appropriate econometric techniques
  • Evaluate the model's performance and interpret the results
  • Refine the model if necessary and draw policy implications

Types of econometric data

  • Cross-sectional data: Observations from different entities at a single point in time
    • Example: Household survey data on income and consumption
  • Time series data: Observations from a single entity over multiple time periods
    • Example: Quarterly GDP data for a country
  • Panel data: Combination of cross-sectional and time series data
    • Observations from multiple entities over multiple time periods
    • Example: Firm-level data on sales and employment over several years
  • Pooled cross-sectional data: Cross-sectional data from multiple time periods
    • Does not track the same entities over time

Linear regression models

Simple linear regression

  • Models the relationship between two variables: one dependent and one independent
  • Equation: Y=β0+β1X+εY = \beta_0 + \beta_1X + \varepsilon
    • YY: Dependent variable
    • XX: Independent variable
    • β0\beta_0: Intercept
    • β1\beta_1: Slope coefficient
    • ε\varepsilon: Error term
  • Estimates the line of best fit that minimizes the sum of squared residuals

Multiple linear regression

  • Extends simple linear regression to include multiple independent variables
  • Equation: Y=β0+β1X1+β2X2+...+βkXk+εY = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_kX_k + \varepsilon
    • X1,X2,...,XkX_1, X_2, ..., X_k: Independent variables
    • β1,β2,...,βk\beta_1, \beta_2, ..., \beta_k: Slope coefficients
  • Allows for controlling for multiple factors that affect the dependent variable

Assumptions of linear regression

  • Linearity: The relationship between the dependent and independent variables is linear
  • Independence: Observations are independently sampled
  • Homoscedasticity: The variance of the error term is constant across all levels of the independent variables
  • Normality: The error term is normally distributed
  • No multicollinearity: Independent variables are not highly correlated with each other

Ordinary least squares estimation

  • OLS is a method for estimating the parameters of a linear regression model
  • Minimizes the sum of squared residuals to find the best-fitting line
  • Under the classical assumptions, OLS estimators are unbiased and efficient
  • Estimates can be obtained using matrix algebra or numerical optimization

Model specification and evaluation

Functional form specification

  • Choosing the appropriate functional form for the relationship between variables
  • Common forms: Linear, logarithmic, quadratic, interaction terms
  • Misspecification can lead to biased and inconsistent estimates
  • Can use graphical analysis or statistical tests (e.g., RESET test) to assess functional form

Goodness of fit measures

  • R-squared: Proportion of variance in the dependent variable explained by the model
    • Ranges from 0 to 1, higher values indicate better fit
  • Adjusted R-squared: Accounts for the number of independent variables in the model
    • Penalizes the addition of irrelevant variables
  • Root Mean Squared Error (RMSE): Measures the average prediction error
    • Lower values indicate better fit

Hypothesis testing in regression

  • Testing the significance of individual coefficients using t-tests
    • Null hypothesis: βi=0\beta_i = 0
    • Alternative hypothesis: βi0\beta_i \neq 0
  • Testing the joint significance of multiple coefficients using F-tests
    • Null hypothesis: β1=β2=...=βk=0\beta_1 = \beta_2 = ... = \beta_k = 0
    • Alternative hypothesis: At least one βi0\beta_i \neq 0
  • P-values and confidence intervals provide information on the precision of estimates

Model selection criteria

  • Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)
    • Trade-off between model fit and complexity
    • Lower values indicate better models
  • Adjusted R-squared: Balances the improvement in fit with the number of variables
  • Stepwise regression: Iteratively adding or removing variables based on statistical significance
  • Economic theory and practical considerations should guide model selection

Violations of classical assumptions

Multicollinearity

  • High correlation among independent variables
  • Causes imprecise estimates and inflated standard errors
  • Detected using correlation matrices or variance inflation factors (VIF)
  • Remedies: Remove redundant variables, combine variables, or use ridge regression

Heteroscedasticity

  • Non-constant variance of the error term across levels of the independent variables
  • Leads to inefficient estimates and invalid inference
  • Detected using residual plots or statistical tests (e.g., Breusch-Pagan test)
  • Remedies: Weighted least squares, heteroscedasticity-robust standard errors

Autocorrelation

  • Correlation between error terms across observations
  • Common in time series data
  • Causes inefficient estimates and invalid inference
  • Detected using residual plots or statistical tests (e.g., Durbin-Watson test)
  • Remedies: Generalized least squares, autoregressive models

Consequences and remedies

  • Violations of classical assumptions can lead to biased, inconsistent, or inefficient estimates
  • Incorrect standard errors and invalid hypothesis tests
  • Remedies depend on the specific violation and the nature of the data
    • Transforming variables, using alternative estimators, or adjusting standard errors
  • Importance of diagnostic tests and sensitivity analysis to assess the robustness of results

Instrumental variables estimation

Endogeneity problem

  • Endogeneity arises when an independent variable is correlated with the error term
  • Causes biased and inconsistent OLS estimates
  • Sources of endogeneity: Omitted variables, measurement error, simultaneous causality
  • Example: Estimating the effect of education on earnings
    • Ability is an omitted variable that affects both education and earnings

Two-stage least squares

  • 2SLS is an instrumental variables estimator that addresses endogeneity
  • First stage: Regress the endogenous variable on the instrument and other exogenous variables
  • Second stage: Regress the dependent variable on the predicted values from the first stage
  • Instrument must be relevant (correlated with the endogenous variable) and exogenous (uncorrelated with the error term)
  • Example: Using compulsory schooling laws as an instrument for education

Validity of instruments

  • Instrument relevance: Tested using the F-statistic from the first-stage regression
    • Rule of thumb: F-statistic should be greater than 10
  • Instrument exogeneity: Cannot be directly tested, relies on theoretical arguments
  • Overidentifying restrictions: When there are more instruments than endogenous variables
    • Sargan-Hansen test can be used to assess the validity of extra instruments
  • Weak instruments can lead to biased estimates and invalid inference

Panel data models

Fixed effects vs random effects

  • Fixed effects model: Accounts for unobserved time-invariant heterogeneity
    • Estimates within-group variation, eliminates bias from omitted variables
    • Cannot estimate the effects of time-invariant variables
  • Random effects model: Assumes unobserved heterogeneity is uncorrelated with the independent variables
    • More efficient than fixed effects, but potentially inconsistent if the assumption is violated
  • Hausman test can be used to choose between fixed and random effects

Pooled OLS estimation

  • Pooled OLS ignores the panel structure of the data and treats observations as independent
  • Consistent and efficient if there is no unobserved heterogeneity
  • Leads to biased and inconsistent estimates in the presence of fixed or random effects
  • Rarely appropriate for panel data, but can be used as a benchmark

Dynamic panel data models

  • Include lagged dependent variables as regressors
  • Capture persistence and adjustment processes
  • OLS estimates are biased and inconsistent due to correlation between the lagged dependent variable and the error term
  • Arellano-Bond and Blundell-Bond estimators use lagged levels and differences as instruments
    • Generalized Method of Moments (GMM) estimation
  • Require careful specification and testing of moment conditions

Limited dependent variable models

Binary choice models

  • Dependent variable is a binary outcome (0 or 1)
  • Examples: Labor force participation, product purchase decisions
  • Linear probability model (LPM): Applies OLS to the binary outcome
    • Pros: Simple to estimate and interpret
    • Cons: Predicted probabilities can be outside the [0, 1] range, constant marginal effects
  • Logit and probit models: Use a nonlinear link function to map the linear predictor to probabilities
    • Ensure predicted probabilities are between 0 and 1
    • Marginal effects vary with the level of the independent variables

Multinomial choice models

  • Dependent variable has more than two unordered categories
  • Examples: Mode of transportation, occupational choice
  • Multinomial logit and probit models extend binary choice models
  • Independence of Irrelevant Alternatives (IIA) assumption in multinomial logit
    • Odds of choosing one alternative over another are independent of the presence of other alternatives
  • Nested logit and mixed logit models can relax the IIA assumption

Tobit and Heckman selection models

  • Tobit model: Dependent variable is censored or corner solution outcome
    • Examples: Household expenditure on a durable good, hours worked
    • Combines a binary choice model with a linear regression for the positive outcomes
  • Heckman selection model: Accounts for sample selection bias
    • Two-step estimation: Selection equation and outcome equation
    • Inverse Mills ratio captures the effect of selection on the outcome
    • Example: Estimating the wage offer function for labor market participants

Time series econometrics

Stationarity and unit roots

  • Stationary time series: Statistical properties do not change over time
    • Constant mean, variance, and autocovariance
  • Non-stationary series: Presence of trends, cycles, or structural breaks
  • Unit root: A stochastic trend in the series
    • Dickey-Fuller and Phillips-Perron tests for unit roots
  • Differencing or detrending can make a series stationary

Cointegration and error correction

  • Cointegration: Long-run equilibrium relationship between non-stationary series
    • Residuals from the cointegrating regression are stationary
    • Engle-Granger and Johansen tests for cointegration
  • Error correction model (ECM): Captures short-run dynamics and adjustment to the long-run equilibrium
    • Includes the lagged residual from the cointegrating regression
    • Represents the speed of adjustment to deviations from the long-run relationship

Vector autoregressive models

  • VAR models: Multivariate time series models where each variable is a function of its own lags and the lags of other variables
  • Treat all variables as endogenous
  • Impulse response functions: Trace out the response of each variable to shocks in other variables
  • Forecast error variance decomposition: Measures the contribution of each variable to the forecast error variance of other variables
  • Granger causality tests: Assess the predictive power of one variable for another

Forecasting with time series models

  • In-sample forecasting: Using the estimated model to predict the dependent variable within the sample period
  • Out-of-sample forecasting: Using the estimated model to predict future values of the dependent variable
  • Rolling window and recursive forecasting schemes
  • Forecast evaluation: Comparing the accuracy of different models
    • Mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE)
    • Diebold-Mariano test for comparing forecast accuracy

Econometric software and applications

  • Stata: User-friendly interface, wide range of built-in commands, and excellent documentation
  • R: Open-source, flexible, and powerful programming language for statistical computing
    • Packages like
      plm
      ,
      lmtest
      ,
      sandwich
      , and
      vars
      for econometric analysis
  • Python: General-purpose programming language with growing popularity in economics and data science
    • Libraries like
      numpy
      ,
      pandas
      ,
      statsmodels
      , and
      linearmodels
      for econometric analysis
  • EViews: Specialized software for time series analysis and forecasting

Empirical examples and case studies

  • Labor economics: Estimating the returns to education, wage determinants, and labor supply elasticities
  • Public economics: Evaluating the impact of taxes and subsidies on individual behavior and welfare
  • Environmental economics: Measuring the effects of pollution on health outcomes and property values
  • Development economics: Assessing the impact of microfinance programs on household income and consumption
  • Finance: Modeling asset prices, volatility, and risk premia

Interpreting and reporting results

  • Presenting the estimated coefficients, standard errors, and significance levels
  • Interpreting the economic and practical significance of the estimates
  • Reporting diagnostic tests and robustness checks
  • Discussing the limitations and potential extensions of the analysis
  • Visualizing the results using tables, graphs, and charts
  • Communicating the findings to both technical and non-technical audiences
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.