6

I am working on a project where I investigate growth in wages due to migration. I correct for the endogeneity in the decision to migrate (only those that are most likely to gain from migration will migrate) by first using a probit model to predict the probabilities of migration based on various characteristics. I then use the predicted probabilities in a second step as a proxy for migration (this in effect is an instrumental variables regression).

My problem is that I get unreasonably high estimates - wages are predicted to increase up to 200%. My concern is that since my predicted probabilities are very low (on average 3%, 25% at the 99th percentile), which is reasonable as in the sample only about 5% migrate, the results that I get come from the marginal increase of probability to migrate from 0 to 1. As far as the predicted probabilities go in my sample, an increase from 0 to 1 is very extreme. Could this be causing the huge estimates? Am I interpreting this correctly? Or should I rather look at the strength of my instruments, etc.?

Keksainis
  • 83
  • 4
  • 1
    I guess you are using linear regression in your second stage. In that case you are running what is termed a "forbidden" regression. The reason is that linear projections and expectations do not carry through a non-linear first stage (like probit). – Andy Nov 19 '14 at 21:25
  • Thanks! Do you have any suggestions for what I could use instead in the second stage? – Keksainis Nov 19 '14 at 21:31
  • Probably it's better not to change the second stage but the approach used for the first stage. I answered below and hopefully the references in there will be of use to you. – Andy Nov 19 '14 at 21:56
  • I know this is a very old question and answer. I just wanted to add something about the "forbidden regression", about which there seems to be an enormous amount of confusion. I went through some of Wooldridge's own comments on Statalist (and his books) and in contrast to what Andy comments (please don't shoot me), it seems that the forbidden regression, is using fitted values of a first stage into a non-linear second stage. I base this on two threads which I will copy in the comment below: – Tom Apr 02 '21 at 11:15
  • Here Wooldridge explains that another poster falls in the forbidden regression trap: https://www.statalist.org/forums/forum/general-stata-discussion/general/1308457-endogeneity-issue-negative-binomial. In this post Wooldridge explains that even using a ordinal probit in the first stage does not pose any issue: https://www.statalist.org/forums/forum/general-stata-discussion/general/1381281-iv-estimation-for-ordinal-variable?_=1617356656297. Please also note that in my opinion Wooldridge (2010) mentions that you can still use 2SLS, but not mimic it by using fitted values! – Tom Apr 02 '21 at 11:18

1 Answers1

6

If you are interested in an approximation of the average partial effect you could just use a linear probability model in the first stage, i.e. do your instrumental variables estimation via 2SLS, for instance, in the usual way. However, due to the non-linearities involved this is not the efficient approach but it can give a good initial idea of the effect under study. For a more in-depth treatment of this argument see Wooldridge (2010) "Econometric Analysis of Cross-Section and Panel Data" in section 15.7.3 from page 594 onward. On page 265-268 he explains the forbidden regression and its problems.

Another procedure that you might be interested in was used by Adams et al. (2009). They use a three-step procedure where they have a probit "first stage" and an OLS second stage without falling for the forbidden regression problem. Their general approach is:

  1. use probit to regress the endogenous variable on the instrument(s) and exogenous variables
  2. use the predicted values from the previous step in an OLS first stage together with the exogenous (but without the instrumental) variables
  3. do the second stage as usual

This procedure will yield unbiased estimates and generally is more efficient than doing 2SLS with a linear probability model in the first stage.

Andy
  • 18,070
  • 20
  • 77
  • 100