Econometrics applies statistical methods to economic data, helping researchers understand relationships between variables and make causal inferences. This field bridges economic theory and real-world observations, using techniques like regression analysis and instrumental variables estimation.
In this chapter, we explore key concepts in econometrics, including linear regression models, panel data analysis, and time series methods. We'll also discuss common challenges like endogeneity and heteroscedasticity, along with strategies to address them in empirical research.
Foundations of econometrics
Causal inference in econometrics
Top images from around the web for Causal inference in econometrics
Frontiers | Use of Mendelian Randomization to Examine Causal Inference in Osteoporosis View original
Is this image relevant?
Magnitude and Determinants of Stunting Among Children in Africa: A Systematic Review View original
Is this image relevant?
Frontiers | Use of Mendelian Randomization to Examine Causal Inference in Osteoporosis View original
Is this image relevant?
Magnitude and Determinants of Stunting Among Children in Africa: A Systematic Review View original
Is this image relevant?
1 of 2
Top images from around the web for Causal inference in econometrics
Frontiers | Use of Mendelian Randomization to Examine Causal Inference in Osteoporosis View original
Is this image relevant?
Magnitude and Determinants of Stunting Among Children in Africa: A Systematic Review View original
Is this image relevant?
Frontiers | Use of Mendelian Randomization to Examine Causal Inference in Osteoporosis View original
Is this image relevant?
Magnitude and Determinants of Stunting Among Children in Africa: A Systematic Review View original
Is this image relevant?
1 of 2
Causal inference aims to establish cause-and-effect relationships between variables
Relies on identifying exogenous variation in the explanatory variable of interest
Randomized controlled trials (RCTs) are the gold standard for causal inference
Randomly assign treatment and control groups to isolate the causal effect
Observational data can be used for causal inference under certain assumptions
Requires controlling for confounding factors and addressing selection bias
Econometric modeling process
Formulate an economic research question or hypothesis
Specify an econometric model based on economic theory and available data
Collect and prepare the relevant data for analysis
Clean and transform variables as needed
Estimate the model using appropriate econometric techniques
Evaluate the model's performance and interpret the results
Refine the model if necessary and draw policy implications
Types of econometric data
Cross-sectional data: Observations from different entities at a single point in time
Example: Household survey data on income and consumption
Time series data: Observations from a single entity over multiple time periods
Example: Quarterly GDP data for a country
Panel data: Combination of cross-sectional and time series data
Observations from multiple entities over multiple time periods
Example: Firm-level data on sales and employment over several years
Pooled cross-sectional data: Cross-sectional data from multiple time periods
Does not track the same entities over time
Linear regression models
Simple linear regression
Models the relationship between two variables: one dependent and one independent
Equation: Y=β0+β1X+ε
Y: Dependent variable
X: Independent variable
β0: Intercept
β1: Slope coefficient
ε: Error term
Estimates the line of best fit that minimizes the sum of squared residuals
Multiple linear regression
Extends simple linear regression to include multiple independent variables
Equation: Y=β0+β1X1+β2X2+...+βkXk+ε
X1,X2,...,Xk: Independent variables
β1,β2,...,βk: Slope coefficients
Allows for controlling for multiple factors that affect the dependent variable
Assumptions of linear regression
Linearity: The relationship between the dependent and independent variables is linear
Independence: Observations are independently sampled
Homoscedasticity: The variance of the error term is constant across all levels of the independent variables
Normality: The error term is normally distributed
No multicollinearity: Independent variables are not highly correlated with each other
Ordinary least squares estimation
OLS is a method for estimating the parameters of a linear regression model
Minimizes the sum of squared residuals to find the best-fitting line
Under the classical assumptions, OLS estimators are unbiased and efficient
Estimates can be obtained using matrix algebra or numerical optimization
Model specification and evaluation
Functional form specification
Choosing the appropriate functional form for the relationship between variables
Common forms: Linear, logarithmic, quadratic, interaction terms
Misspecification can lead to biased and inconsistent estimates
Can use graphical analysis or statistical tests (e.g., RESET test) to assess functional form
Goodness of fit measures
R-squared: Proportion of variance in the dependent variable explained by the model
Ranges from 0 to 1, higher values indicate better fit
Adjusted R-squared: Accounts for the number of independent variables in the model
Penalizes the addition of irrelevant variables
Root Mean Squared Error (RMSE): Measures the average prediction error
Lower values indicate better fit
Hypothesis testing in regression
Testing the significance of individual coefficients using t-tests
Null hypothesis: βi=0
Alternative hypothesis: βi=0
Testing the joint significance of multiple coefficients using F-tests
Null hypothesis: β1=β2=...=βk=0
Alternative hypothesis: At least one βi=0
P-values and confidence intervals provide information on the precision of estimates
Model selection criteria
Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)
Trade-off between model fit and complexity
Lower values indicate better models
Adjusted R-squared: Balances the improvement in fit with the number of variables
Stepwise regression: Iteratively adding or removing variables based on statistical significance
Economic theory and practical considerations should guide model selection
Violations of classical assumptions
Multicollinearity
High correlation among independent variables
Causes imprecise estimates and inflated standard errors
Detected using correlation matrices or variance inflation factors (VIF)
Remedies: Remove redundant variables, combine variables, or use ridge regression
Heteroscedasticity
Non-constant variance of the error term across levels of the independent variables
Leads to inefficient estimates and invalid inference
Detected using residual plots or statistical tests (e.g., Breusch-Pagan test)
Remedies: Weighted least squares, heteroscedasticity-robust standard errors
Autocorrelation
Correlation between error terms across observations
Common in time series data
Causes inefficient estimates and invalid inference
Detected using residual plots or statistical tests (e.g., Durbin-Watson test)
Remedies: Generalized least squares, autoregressive models
Consequences and remedies
Violations of classical assumptions can lead to biased, inconsistent, or inefficient estimates
Incorrect standard errors and invalid hypothesis tests
Remedies depend on the specific violation and the nature of the data
Transforming variables, using alternative estimators, or adjusting standard errors
Importance of diagnostic tests and sensitivity analysis to assess the robustness of results
Instrumental variables estimation
Endogeneity problem
Endogeneity arises when an independent variable is correlated with the error term
Causes biased and inconsistent OLS estimates
Sources of endogeneity: Omitted variables, measurement error, simultaneous causality
Example: Estimating the effect of education on earnings
Ability is an omitted variable that affects both education and earnings
Two-stage least squares
2SLS is an instrumental variables estimator that addresses endogeneity
First stage: Regress the endogenous variable on the instrument and other exogenous variables
Second stage: Regress the dependent variable on the predicted values from the first stage
Instrument must be relevant (correlated with the endogenous variable) and exogenous (uncorrelated with the error term)
Example: Using compulsory schooling laws as an instrument for education
Validity of instruments
Instrument relevance: Tested using the F-statistic from the first-stage regression
Rule of thumb: F-statistic should be greater than 10
Instrument exogeneity: Cannot be directly tested, relies on theoretical arguments
Overidentifying restrictions: When there are more instruments than endogenous variables
Sargan-Hansen test can be used to assess the validity of extra instruments
Weak instruments can lead to biased estimates and invalid inference
Panel data models
Fixed effects vs random effects
Fixed effects model: Accounts for unobserved time-invariant heterogeneity
Estimates within-group variation, eliminates bias from omitted variables
Cannot estimate the effects of time-invariant variables
Random effects model: Assumes unobserved heterogeneity is uncorrelated with the independent variables
More efficient than fixed effects, but potentially inconsistent if the assumption is violated
Hausman test can be used to choose between fixed and random effects
Pooled OLS estimation
Pooled OLS ignores the panel structure of the data and treats observations as independent
Consistent and efficient if there is no unobserved heterogeneity
Leads to biased and inconsistent estimates in the presence of fixed or random effects
Rarely appropriate for panel data, but can be used as a benchmark
Dynamic panel data models
Include lagged dependent variables as regressors
Capture persistence and adjustment processes
OLS estimates are biased and inconsistent due to correlation between the lagged dependent variable and the error term
Arellano-Bond and Blundell-Bond estimators use lagged levels and differences as instruments
Generalized Method of Moments (GMM) estimation
Require careful specification and testing of moment conditions
Limited dependent variable models
Binary choice models
Dependent variable is a binary outcome (0 or 1)
Examples: Labor force participation, product purchase decisions
Linear probability model (LPM): Applies OLS to the binary outcome
Pros: Simple to estimate and interpret
Cons: Predicted probabilities can be outside the [0, 1] range, constant marginal effects
Logit and probit models: Use a nonlinear link function to map the linear predictor to probabilities
Ensure predicted probabilities are between 0 and 1
Marginal effects vary with the level of the independent variables
Multinomial choice models
Dependent variable has more than two unordered categories
Examples: Mode of transportation, occupational choice
Multinomial logit and probit models extend binary choice models
Independence of Irrelevant Alternatives (IIA) assumption in multinomial logit
Odds of choosing one alternative over another are independent of the presence of other alternatives
Nested logit and mixed logit models can relax the IIA assumption
Tobit and Heckman selection models
Tobit model: Dependent variable is censored or corner solution outcome
Examples: Household expenditure on a durable good, hours worked
Combines a binary choice model with a linear regression for the positive outcomes
Heckman selection model: Accounts for sample selection bias
Two-step estimation: Selection equation and outcome equation
Inverse Mills ratio captures the effect of selection on the outcome
Example: Estimating the wage offer function for labor market participants
Time series econometrics
Stationarity and unit roots
Stationary time series: Statistical properties do not change over time
Constant mean, variance, and autocovariance
Non-stationary series: Presence of trends, cycles, or structural breaks
Unit root: A stochastic trend in the series
Dickey-Fuller and Phillips-Perron tests for unit roots
Differencing or detrending can make a series stationary
Cointegration and error correction
Cointegration: Long-run equilibrium relationship between non-stationary series
Residuals from the cointegrating regression are stationary
Engle-Granger and Johansen tests for cointegration
Error correction model (ECM): Captures short-run dynamics and adjustment to the long-run equilibrium
Includes the lagged residual from the cointegrating regression
Represents the speed of adjustment to deviations from the long-run relationship
Vector autoregressive models
VAR models: Multivariate time series models where each variable is a function of its own lags and the lags of other variables
Treat all variables as endogenous
Impulse response functions: Trace out the response of each variable to shocks in other variables
Forecast error variance decomposition: Measures the contribution of each variable to the forecast error variance of other variables
Granger causality tests: Assess the predictive power of one variable for another
Forecasting with time series models
In-sample forecasting: Using the estimated model to predict the dependent variable within the sample period
Out-of-sample forecasting: Using the estimated model to predict future values of the dependent variable
Rolling window and recursive forecasting schemes
Forecast evaluation: Comparing the accuracy of different models
Mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE)
Diebold-Mariano test for comparing forecast accuracy
Econometric software and applications
Popular econometric software packages
Stata: User-friendly interface, wide range of built-in commands, and excellent documentation
R: Open-source, flexible, and powerful programming language for statistical computing
Packages like
plm
,
lmtest
,
sandwich
, and
vars
for econometric analysis
Python: General-purpose programming language with growing popularity in economics and data science
Libraries like
numpy
,
pandas
,
statsmodels
, and
linearmodels
for econometric analysis
EViews: Specialized software for time series analysis and forecasting
Empirical examples and case studies
Labor economics: Estimating the returns to education, wage determinants, and labor supply elasticities
Public economics: Evaluating the impact of taxes and subsidies on individual behavior and welfare
Environmental economics: Measuring the effects of pollution on health outcomes and property values
Development economics: Assessing the impact of microfinance programs on household income and consumption
Finance: Modeling asset prices, volatility, and risk premia
Interpreting and reporting results
Presenting the estimated coefficients, standard errors, and significance levels
Interpreting the economic and practical significance of the estimates
Reporting diagnostic tests and robustness checks
Discussing the limitations and potential extensions of the analysis
Visualizing the results using tables, graphs, and charts
Communicating the findings to both technical and non-technical audiences