Instrumental Variables Estimation - Health Economics

Health economists frequently face the challenge of estimating causal relationships in the absence of controlled experiments. For example, a long-standing issue in economics and in other disciplines is unraveling the observed relationship between education and health. Countless studies have documented a positive correlation between these outcomes, but fewer have successfully addressed the causal impact of education and health. In principle, randomized controlled trials (RCTs) could be used, but it is difficult to experimentally manipulate levels of education. Instrumental variables (IV) methods can be used when the real world provides some quasiexperimental variation in education. In this article, the use and the limitations of the IV approach are discussed. The authors illustrate how IV approach works, review its relationship with the experimental approach, identify the properties of good natural experiments, and discuss the statistical properties of the IV estimator when the natural experiment is less than ideal.

The Instrumental Variables Estimator

An Intuitive Explanation For The Univariate Model

Consider the statistical properties of the linear IV estimator. For the sake of simplicity, the univariate case is presented, and the constant is suppressed by assuming that all variables are expressed as deviations from their respective sample means. Suppose that the effect of a broadly defined ‘treatment,’ x, on an outcome y is to be estimated. Data on y and x are collected for a random sample of n observations; y_i and x_i denote the values of these variables for the ith observation. The treatment affects the outcome according to a linear regression of the form

where β is an unknown parameter to be estimated and u_i is an unobserved error term, interpreted as all causes of y_iother than x_i. Here, β is interpreted as the causal effect of x on y, and x and u are possibly correlated. The variables u and x will be correlated if there are variables unobserved to the researcher which cause both x and y (‘omitted variables’ in econometrics, or ‘unobserved confounders’ in some other disciplines) or if y ‘reverse’ causes x. The researcher may attempt to address omitted variables by using standard multivariate regression specifications and adding more independent variables to the model, but commonly, as in the education and health example above, even very rich datasets will exclude information on countless personality, cognitive, background, and contextual variables that may affect both the outcome and the intensity of treatment. Moreover, controlling for additional variables does not help resolve the ‘reverse’ causation problem. Methods other than IV are sometimes available – such as regression discontinuity designs, or certain longitudinal data approaches – but attention here is limited to IV.

When a regressor is correlated with the error term u, it is said to be endogenous; if not it is said to be exogenous. If ordinary least squares (OLS) is used to estimate the parameters of this equation, then the OLS estimator of β, denoted β, will be biased and inconsistent if x is endogenous. It can be shown that