🎳Intro to Econometrics Unit 9 – Instrumental Variables & Two-Stage LS

Instrumental variables and two-stage least squares are powerful tools for addressing endogeneity in econometric models. These methods help economists estimate causal effects when explanatory variables are correlated with error terms, which can arise from omitted variables, measurement error, or simultaneous causality. By using valid instruments that are correlated with endogenous variables but uncorrelated with error terms, researchers can isolate exogenous variation and obtain consistent estimates. The two-stage least squares approach implements this strategy, first regressing endogenous variables on instruments, then using predicted values in the main regression.

Key Concepts

  • Instrumental variables (IV) address endogeneity issues in regression models when explanatory variables are correlated with the error term
  • Endogeneity arises from omitted variables, measurement error, or simultaneous causality leading to biased and inconsistent OLS estimates
  • Valid instruments are correlated with the endogenous explanatory variable but uncorrelated with the error term
    • Instruments should be relevant (strong correlation with the endogenous variable) and exogenous (no direct effect on the dependent variable)
  • Two-stage least squares (2SLS) is a common method for implementing IV estimation
    • First stage regresses the endogenous variable on the instrument(s) and other exogenous variables
    • Second stage uses the predicted values from the first stage in place of the endogenous variable
  • IV and 2SLS aim to obtain consistent estimates of the causal effect of the explanatory variable on the dependent variable
  • Weak instruments (low correlation with the endogenous variable) can lead to biased IV estimates and large standard errors
  • Overidentification occurs when there are more instruments than endogenous variables allowing for testing the validity of the instruments

Problem of Endogeneity

  • Endogeneity violates the assumption of zero conditional mean of the error term E[uX]=0E[u|X] = 0 required for unbiased and consistent OLS estimates
  • Omitted variable bias occurs when a relevant variable is excluded from the model and is correlated with both the dependent and explanatory variables
    • Example: estimating the effect of education on earnings without controlling for ability
  • Measurement error in the explanatory variable leads to attenuation bias (downward bias) in the OLS estimates
    • Example: using self-reported income instead of actual income
  • Simultaneous causality or reverse causality arises when the dependent variable also affects the explanatory variable
    • Example: estimating the effect of police on crime while crime levels influence police allocation
  • Endogeneity causes the explanatory variable to be correlated with the error term leading to biased and inconsistent estimates of the causal effect
  • IV methods aim to isolate the exogenous variation in the endogenous explanatory variable to obtain consistent estimates

Instrumental Variables (IV) Explained

  • Instrumental variables (IV) are used to address endogeneity by finding a source of exogenous variation in the endogenous explanatory variable
  • An instrument ZZ is a variable that is correlated with the endogenous explanatory variable XX but uncorrelated with the error term uu
    • Cov(Z,X)0Cov(Z, X) \neq 0 (relevance condition)
    • Cov(Z,u)=0Cov(Z, u) = 0 (exogeneity condition)
  • The instrument affects the dependent variable YY only through its effect on the endogenous explanatory variable XX
    • Example: using distance to college as an instrument for education when estimating the effect of education on earnings
  • IV estimation isolates the exogenous variation in XX that is uncorrelated with the error term to obtain a consistent estimate of the causal effect
  • The IV estimator is given by βIV=Cov(Z,Y)Cov(Z,X)\beta_{IV} = \frac{Cov(Z, Y)}{Cov(Z, X)} which is consistent under the relevance and exogeneity conditions
  • Multiple instruments can be used for a single endogenous variable to improve efficiency and allow for overidentification tests
  • The reduced form equation regresses the dependent variable directly on the instrument(s) and other exogenous variables

Criteria for Valid Instruments

  • Relevance: the instrument must be correlated with the endogenous explanatory variable
    • Weak instruments (low correlation) can lead to biased IV estimates and large standard errors
    • The first-stage F-statistic tests the strength of the instrument(s) with a rule of thumb of F > 10 indicating a strong instrument
  • Exogeneity: the instrument must be uncorrelated with the error term in the structural equation
    • The instrument should not have a direct effect on the dependent variable other than through the endogenous explanatory variable
    • Overidentifying restrictions tests (e.g., Sargan-Hansen test) can be used to assess the validity of multiple instruments
  • Exclusion restriction: the instrument should not be correlated with any omitted variables that affect the dependent variable
    • This condition is not directly testable and relies on theoretical justification
  • Monotonicity: the effect of the instrument on the endogenous variable should be monotonic (always positive or always negative) for all individuals
    • This assumption is required for the interpretation of the local average treatment effect (LATE) in the presence of heterogeneous treatment effects
  • External validity: the IV estimate may not be generalizable to the entire population if the effect of the endogenous variable varies across individuals
    • The IV estimate represents the LATE for the subpopulation affected by the instrument (compliers)

Two-Stage Least Squares (2SLS) Method

  • Two-stage least squares (2SLS) is a common method for implementing IV estimation when the endogenous explanatory variable is continuous
  • The first stage regresses the endogenous explanatory variable XX on the instrument(s) ZZ and other exogenous variables WW:
    • X=δ0+δ1Z+δ2W+vX = \delta_0 + \delta_1 Z + \delta_2 W + v
    • This stage isolates the exogenous variation in XX that is uncorrelated with the error term in the structural equation
  • The second stage regresses the dependent variable YY on the predicted values of XX from the first stage (X^\hat{X}) and other exogenous variables WW:
    • Y=β0+β1X^+β2W+uY = \beta_0 + \beta_1 \hat{X} + \beta_2 W + u
    • The predicted values X^\hat{X} are uncorrelated with the error term uu by construction
  • The 2SLS estimator is consistent and asymptotically normal under the relevance and exogeneity conditions
  • Standard errors in the second stage need to be adjusted to account for the two-step estimation process
    • This can be done using the Huber-White sandwich estimator or bootstrapping
  • 2SLS can be extended to multiple endogenous variables and multiple instruments (e.g., three-stage least squares)

Implementing IV and 2SLS

  • Identify the endogenous explanatory variable(s) and potential instruments based on theoretical considerations and institutional knowledge
  • Check the relevance condition by regressing the endogenous variable on the instrument(s) and testing for significance (first-stage F-statistic)
    • If the instruments are weak, consider finding stronger instruments or using alternative methods (e.g., limited information maximum likelihood)
  • Assess the exogeneity condition using overidentification tests if there are more instruments than endogenous variables
    • Sargan-Hansen J-test for overidentifying restrictions
    • Failure to reject the null hypothesis supports the validity of the instruments
  • Estimate the first-stage regression and obtain the predicted values of the endogenous variable
  • Estimate the second-stage regression using the predicted values from the first stage in place of the endogenous variable
  • Interpret the IV estimates as the local average treatment effect (LATE) for the subpopulation affected by the instrument (compliers)
    • The LATE may differ from the average treatment effect (ATE) if the effect of the endogenous variable varies across individuals
  • Report the first-stage F-statistic, overidentification test results, and adjusted standard errors in addition to the IV estimates
  • Conduct robustness checks using alternative instruments, subsamples, or estimation methods to assess the sensitivity of the results

Limitations and Challenges

  • Finding valid instruments that satisfy the relevance and exogeneity conditions can be difficult in practice
    • Instruments that are theoretically justified may be weakly correlated with the endogenous variable leading to biased estimates
    • Instruments that are strongly correlated with the endogenous variable may have direct effects on the dependent variable violating the exclusion restriction
  • Weak instruments can lead to biased IV estimates, large standard errors, and incorrect inference
    • The bias of the IV estimator is inversely proportional to the strength of the instrument (first-stage F-statistic)
    • Weak instrument robust inference methods (e.g., Anderson-Rubin test) can be used to construct confidence intervals
  • The LATE interpretation of the IV estimate may not be generalizable to the entire population if the effect of the endogenous variable varies across individuals
    • The IV estimate represents the effect for the subpopulation affected by the instrument (compliers) which may differ from the average treatment effect (ATE)
  • IV estimation can be less efficient than OLS when the instruments are weak or the sample size is small
    • The standard errors of the IV estimator are larger than those of the OLS estimator in the absence of endogeneity
  • Measurement error in the instrument can lead to biased IV estimates and incorrect inference
    • The bias is proportional to the degree of measurement error and inversely proportional to the strength of the instrument
  • IV estimation relies on strong assumptions (relevance, exogeneity, exclusion restriction, monotonicity) that are not directly testable and may be violated in practice
    • Sensitivity analysis using alternative instruments and estimation methods can help assess the robustness of the results

Real-World Applications

  • Estimating the returns to education using compulsory schooling laws or distance to college as instruments for educational attainment
    • Addresses the endogeneity of education due to omitted variables (e.g., ability) or measurement error
  • Evaluating the effect of military service on earnings using the Vietnam War draft lottery as an instrument for veteran status
    • Exploits the random assignment of draft eligibility based on birth dates to identify the causal effect of military service
  • Assessing the impact of air pollution on health outcomes using wind direction or traffic congestion as instruments for pollutant concentrations
    • Addresses the endogeneity of pollution due to omitted variables (e.g., industrial activity) or measurement error
  • Estimating the effect of immigration on native wages using historical settlement patterns or policy changes as instruments for immigrant inflows
    • Deals with the endogeneity of immigration due to self-selection or reverse causality (e.g., immigrants may be attracted to areas with higher wages)
  • Analyzing the impact of financial development on economic growth using legal origins or geographic characteristics as instruments for financial institutions
    • Addresses the endogeneity of finance due to omitted variables (e.g., institutional quality) or reverse causality (e.g., growth may lead to financial development)
  • Evaluating the effect of health insurance on healthcare utilization using Medicaid eligibility rules or employer-provided coverage as instruments for insurance status
    • Deals with the endogeneity of insurance due to self-selection or omitted variables (e.g., health status)


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.