"Control for" vs "Ignore" mutually exclusive dummy variables

Question

I have come across an excellent answer of @Gung to a previous post. However, I still have a doubt. Using the example of @Gung, suppose that there are three main variables: the dependent variable Y, and two main predictors, X1 and X2. But let X1 and X2 be mutually exclusive dummies, ie, if X1 (X2) is 1, X2 (X1) is 0 (and to avoid the dummy trap let there be also cases in which X1 and X2 are both 0).

In this case, to study the effect of X1 over Y, is "controlling" for X2 similar to "ignoring" it?

Moreover, to analyze the effect of X1 and X2 over Y, is this model

Y = a + b * X1 + c * X2

interchangeable with a two-equations system like this,

(1) Y = v + w * X1, (2) Y = h + k * X2?

score 1 · Answer 1 · answered Mar 07 '15 at 10:32

1

No, in your example "controlling" for $x_2$ is not similar to "ignoring" it. $x_1$ and $x_2$ are (negatively) correlated. The fewer there are observations where both $x_1$ and $x_2$ have zero values relative to the number of observations where one of $x_1$ and $x_2$ has a unity value, the stronger the correlation. As long as regressors are correlated, excluding one of them will produce all the trouble known as omitted variable bias. That was illustrated in the post you are referring to.

(Excluding $x_2$ from the model ("ignoring" $x_2$) would do limited harm if $x_1$ and $x_2$ were orthogonal -- see Problem 93.4.1 here.)

Thus also the answer to your second question: you cannot use the model with both $x_1$ and $x_2$ and two separate models with just $x_1$ and just $x_2$ interchangeably.

answered Mar 07 '15 at 10:32

Richard Hardy

54,375
10
95
219

I am a bit confused. I might be missing something. (1) The answer of @Gung deals with omitted variable bias, but, if two regressors are highly correlated, would not it be better to exclude one of them? At the end of the day, both of them provide the same information, but give rise to collinearity. (2) As regards the link to Problem 93.4.1, does it not conclude the opposite to what you suggest: "it is always better to include variables in a regression model that are orthogonal". (3) Are not orthogonal two mutually exclusive dummy variables by definition? – madu Mar 07 '15 at 11:01
There are some nuances. (1) If the true model is $y=\beta_0+\beta_1 x_1+\beta_2 x_2+e$ then ignoring $x_2$ will produce a biased estimate of $x_1$. Thus you will not only miss the $x_2$ which is relevant, but also have a biased result with regards to $x_1$. (2) Yes, that is the conclusion and that is targeted to answer your original question. That problem is about orthogonal regressors when the impact of omitting a variable is much less harmful than in the general case of correlated regressors - as I said *limited harm* above. (3) No. Try an example in `Excel`, `R` or your favourite software. – Richard Hardy Mar 07 '15 at 11:08
@RichardHady. I might still missing something. Take x1 = (1,0) and x2 = (0, 1). Then, x1t * x2 = 0. Is not this orthogonality? – madu Mar 07 '15 at 13:35
This sample is too short to be a good example; still, the two variables are perfectly negatively correlated, i.e. $corr(x_1,x_2)=-1$. You can easily check that. The correlation formula is **not** $corr(x,y)=\mathbb{E}(xy)$ but rather $corr(x,y)=\mathbb{E}((x-\mu_x)(y-\mu_y))$. On the other hand, in your original post you presumably consider sample size that is more than just 2 and specify mutually exclusive dummies that are negatively correlated. – Richard Hardy Mar 07 '15 at 13:43

"Control for" vs "Ignore" mutually exclusive dummy variables

1 Answers1