5

I just have a quick question: what if I'm interested in estimating a logit/probit model in the second stage, can I follow this two-step procedure by running OLS in the first stage (endogenous variable = exogenous variables + instruments) and then replace the endogenous variable with the fitted value in the second stage when I run the logit/probit estimation?

I just saw this post (2SLS but second stage Probit) answers the above question, it seems the answer is positive, but does anyone have any references that I can cite regarding this issue?

fccog
  • 135
  • 1
  • 1
  • 7

2 Answers2

6

The reference for this should be Newey (1987) "Efficient estimation of limited dependent variable models with endogenous explanatory variables", Journal of Econometrics, Vol. 36(3), pp. 231–250 (link). This is the estimator that is implemented with the probitiv command in Stata, for instance, where you can have an OLS first stage and probit second stage.

Andy
  • 18,070
  • 20
  • 77
  • 100
  • Thank you Andy, so according to your previous post, I can manually run OLS in the 1st stage, get the fitted values for the endogenous variable, manually run logit model with those fitted values as independent variable in the 2nd stage, after that, apply bootstrapping to get the correct s.e., am I misunderstanding anything here? – fccog Mar 14 '15 at 20:03
  • No you understood correctly. This way it works. If you try it the other way around then this will not give you consistent estimates because 2SLS makes use of properties of the expectations and linear projection operators which do not carry through non-linear first stages (e.g. probit/logit). But linear 1st stage, probit/logit second stage should be fine though there are more efficient estimation procedures. Have a look at Stata's `ivprobit` documentation for a reference to those. – Andy Mar 14 '15 at 20:19
  • I am just commenting here, because I would like to know what can I do if I only have a *binary* instrument for my first stage? I know this is a long shot (and I even opened my own question about it) but I just thought maybe you know! – canIchangethis Feb 17 '21 at 21:14
2

When googling this problem myself, I found the highly-cited article

Terza, J.V., Basu, A. and Rathouz, P.J., 2008. Two-stage residual inclusion estimation: addressing endogeneity in health econometric modeling. Journal of health economics, 27(3), pp.531-543.

which proposes to use a method called 2-stage residual inclusion (2SRI) for the general linear model case. The method is very simple: Fit the first-stage model to get the residual and include both the residuals and the endogenous variable in the second-stage model.

Or more formally, let $_2$ be the endogenous variable, $_1$ till $_8$ the other exogenous control variables and $_1$ and $_2$ two instruments for $_2$. In the first stage, $_2$ is explained using linear regression $_2=_0+_1 _1+_2 _2+…+_8 _8+_9 _1+_10 _2+_2$, with $$ as coefficients and $_2$ as error term. The equation splits $_2$ in an exogenous component $_0+_1 _1+_2 _2+…+_8 _8+_9 _1+_10 _2$ and omitted-variable component $_2$. The 2SRI method includes both the endogenous variable $_2$ and the residual $_2$ as estimator for the omitted variable in the model; i.e. $_1=logit(_0+_1 _1+_2 _2+…+_8 _8+_9 _2+_10 _2 )+_1$ with $_1$ being the dichotomous variable. The implementation with a statics software is straight forward. (However, getting the standard errors for the estimators is not.)

It has been shown by

Burgess, S., & Thompson, S. G. (2012). Improving bias and coverage in instrumental variable analysis with weak instruments for continuous and binary outcomes. Statistics in medicine, 31(15), 1582-1600.

through a simulation that the 2SRI is better than 2SLS to provide another reference.

Tom Pape
  • 607
  • 3
  • 12