I would need a little clarification on this very basic issue. I'm trying to predict a continuous outcome based on a continuous predictor and a categorical variable with 3 levels using multiple linear regression. The fit tells me that when the continuous variable increases, also my dependent variable increases. Fair enough. On the other hand, it is also telling me that being in either of the k-1 categories decreases my dependent variable with respect to the baseline. If I check the mean value of the dependent variable in the three categories, nevertheless, I notice that, as a matter of fact it is higher in the two categories with respect to the baseline. What do I conclude from this? It sounds a little counter intuitive to me..
-
For a similar example of counterintuitive behaviour with a categorical predictor and a continuous predictor, you can see https://stats.stackexchange.com/questions/185047/is-this-simpsons-paradox-on-the-titanic-data-set – Pere May 31 '17 at 21:40
1 Answers
You fit 3 straight lines with the same slop and 3 different intercepts in the analysis. Let $$\hat{Y_1} = \hat\beta_{01} +\hat\beta_1 X \text{ for reference}$$ $$\hat{Y_2} = \hat\beta_{02} +\hat\beta_1 X $$ $$\hat{Y_3} = \hat\beta_{03} +\hat\beta_1 X $$ "being in either of the k-1 categories decreases my dependent variable with respect to the baseline" means $\hat\beta_{02}$ and $\hat\beta_{03}$ < $\hat\beta_{01}$.
"If I check the mean value of the dependent variable in the three categories, nevertheless, I notice that, as a matter of fact it is higher in the two categories with respect to the baseline." means $\bar{Y_1} < \bar {Y_2}$ and $\bar{Y_3}$, where $$\bar{Y_1} = \hat\beta_{01} +\hat\beta_1 \bar{X_1} \text{ for reference}$$ $$\bar{Y_2} = \hat\beta_{02} +\hat\beta_1 \bar{X_2} $$ $$\bar{Y_3} = \hat\beta_{03} +\hat\beta_1 \bar{X_3} $$
So although $\hat\beta_{02}$ and $\hat\beta_{03}$ < $\hat\beta_{01}$, given $\bar {X_1} < \bar {X_2}$ and $\bar{X_3}$ and the differences are large enough, you will get that $\bar{Y_1} < \bar {Y_2}$ and $\bar{Y_3}$ because $\beta_1 > 0$ as you noticed. So checking $\bar {X_1}, \bar {X_2}, \bar{X_3}$, you will find the answer.
($\bar X$ means the sample mean.)

- 7,032
- 2
- 9
- 19
-
Thank you, excellent explanation. In effect, I do preserve an higher outcome despite the negative coefficients for the k-1 categories. – La Machine Infernale Jun 01 '17 at 10:13
-
I also noticed that by removing the continuous predictor, and leaving only the categorical ones, the negative coefficients disappear. Do I need to conclude for specificity in the way continuous predictor and dependent variable interact in my three groups? – La Machine Infernale Jun 01 '17 at 10:19
-
No, it is not interaction. It seems from your data, you can get the following conclusion: "When continuous variable increases, dependent variable increases. The k-1 categories decreases in dependent variable comparing with to the baseline category (generally, we call it reference) when the continuous variable is controlled". – user158565 Jun 01 '17 at 16:44