3

I have the following structural model, by which one of the regressors is partially explained by another.

$$ y_1= x_1+x_2+x_3+e \tag{1} $$

$$ x_1= x_2 + u \tag{2} $$ The questions are:

a) Can I run equation (1) alone without getting the coefficients wrong?

b) If as I suspect I need to take into account the endogeneity of the regressors, which way would you suggest? SEM, SUR, ivreg?

Jeremy Miles
  • 13,917
  • 6
  • 30
  • 64
Luis
  • 31
  • 1
  • Are you familiar with mediation? This kind of looks like a mediation model: x2 predicts x1, which in turn predicts y1. And x3 is a covariate? – Mark White May 19 '17 at 20:32
  • Thank you for your answer. Never heard of mediation. Yes, x3 is a another covariate. In my particular case I am estimating hours worked in first job= having a second job+ wage first job + education of the person. The second equation aims at modelling the effect of wage into the probability of having a second job. I am going to learn about mediation models and see if I can understand the problem better. – Luis May 19 '17 at 21:00

1 Answers1

0

a) Yes, if $u$ and $e$ are independent, then you just work with equation (1). I recommend drawing this out as a directed graph to see it more clearly. According to Judea Pearl's approach, you get bias when there are either 1) omitted variables, or 2) unblocked "backdoor paths" between an explanatory variable and the response. If $u$ had was on the left side of $e$'s equation, then $u$ would be simultaneously causing both $x_2$ and $y$ without being taken into account by the model. To see this actually happen, first run the code below as is, and you'll see no bias. Then comment the line that links $u$ and $e$, and you'll see bias.

N <- 10000

x2 <- rnorm(N)
x3 <- rnorm(N)
u <- rnorm(N)
epsilon <- rnorm(N)
# epsilon <- .6 * u + rnorm(N)
x1 <- x2 + u
y <- x1 + x2 + x3 + epsilon

summary(lm(y ~ x1 + x2 + x3))

b) If you have a theoretical model for how $u$ and $e$ are related, you could use a likelihood-based method to estimate the parameters (which I suppose you could implement using SEM software). Having a model like that on hand is unlikely. Almost as unlikely is that you find a variable that predicts $x_2$ but not $e$ (which is not empirically testable). Then you can use the IV methods like 2SLS.

Ben Ogorek
  • 4,629
  • 1
  • 21
  • 41