2

I am running a logistic regression in order to determine the error rate of an outcome given some covariates. Two of my covariates are indicator flags for the location. When I include an intercept, one of the location flags is dropped which I understand. What I do not understand is that my $R^2$ also drops from around 0.82 to around 0.06. My parameter estimates do not change at all apart from the remaining location flag, and my intercept takes the value of the location flag that was removed.

Essentially,

$$ logit(Y_i = 1) = \mathbf{\beta X} + \gamma_1i_1 + \gamma_2i_2 $$ has an $R^2$ of around 0.82, while $$ logit(Y_i = 1) = \beta_0 + \mathbf{\beta X} + \gamma_1i_1 $$ has an $R^2$ of around 0.06

  • possible duplicate of [Fewer variables have higher R-squared value in logistic regression](http://stats.stackexchange.com/questions/35019/fewer-variables-have-higher-r-squared-value-in-logistic-regression) – Xi'an Apr 30 '15 at 14:11
  • Do you have any missing values in any of the variables? Are both $\gamma$ flags 2-level categorical variables? – gung - Reinstate Monica Apr 30 '15 at 15:24

1 Answers1

8

Keep in mind there is no real $R^2$ for logistic regression. There may be a variety of pseudo-$R^2$s, but their mileage can vary.

For your first model, the baseline model for the pseudo-$R^2$ is logit=0, i.e., prob $Y_i=1$ is 0.5. For nearly any data, this is an awful model, so no wonder than adding anything shows a big improvement.

For your second model, the baseline is logit = const, not necessarily 0, so prob $Y_i=1$ is const not necessarily 0.5.

If your actual proportion of ones is say 0.1, with $X_i$ having a moderate amount of explanatory power, then your second model will show a modest $R^2$, while the first model only works to the extent that it uses $\bar X$ as a crutch in place of the intercept to move the predicted probabilities towards 0.1.

StasK
  • 29,235
  • 2
  • 80
  • 165