0

I want to estimate a Logit model where the independent variables are binary and one of them is categorical, so the whole data set consists of dummy variables.

First of all, I am puzzled whether I should include an intercept in my model.

Secondly, if both options are possible, how does it affect the interpretation of the coefficients?

And does it affect whether I need to choose a reference group or not?

  • Also: https://stats.stackexchange.com/questions/260209/the-difference-between-with-or-without-intercept-model-in-logistic-regression/260214#260214 – kjetil b halvorsen Feb 20 '21 at 14:13

1 Answers1

3

You definitely need the intercept. Say you have two binary $X$'s and a binary $Y$. Without the intercept, your model is

$\log(p/(1-p)) = \beta_1 X_1 + \beta_2 X_2$.

When both $X$'s are at their reference levels, you get

$\log(p/(1-p)) = \beta_1 0 + \beta_2 0 = 0$.

This implies that $p = 0.5$ when both of your independent variables are at their reference levels. So unless you have super strong rationale for forcing a 0.5 probability on "success" for your dependent variable when your independent variables are at their reference levels, you should include the intercept in your model.

(On the other hand, if you do not "leave one out" in your modeling of the dummy and categorical $X$ variables, then you need to exclude the intercept to avoid a perfect multicollinearity. In this case the problem noted above vanishes.)

BigBendRegion
  • 4,593
  • 12
  • 22