3

I was really confused if I should ask this here or in Stackoverflow, but I'll give a shot here. I ran a logistic regression using statsmodels library in Python. However, two things went wrong here (which I think are the same), I got all dependent variables with p-values equal to one and a message "Complete Separation: The results show that there is complete separation. In this case the Maximum Likelihood Estimator does not exist and the parameters are not identified."

So my question is about what this result means. This is what I've done (in case anyone want to check it out):

import statsmodels.api as sm

y = model_df[['var_y']]
X = model_df['var1','var2','var3','var4']

logit_model = sm.Logit(y_train, X_train)
result = logit_model.fit()
print(result.summary())  
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
trder
  • 610
  • 3
  • 12
  • 1
    is possible to show the summary? complete separation is as Lukasz answered below. What I am not sure of is why your p-values are equal one.. The summary might clear things up a bit – StupidWolf May 07 '20 at 12:14

1 Answers1

2

Complete separation for single variable, say $X_1$, means that you can tell value of dependent variable, say $Y$, based only on $X_1$ and you can do it with 100% accuracy.

For example, you can have $Y=1$ when $X_1>20$ and $Y=0$ otherwise. Or if $X_1$ is categorical, you can have $Y=1$ for some categories of $X_1$ and $Y=0$ for other categories.

p-values equal to 1 for all variables indicate that you have complete separation for all of your $X$s.

Łukasz Deryło
  • 3,735
  • 1
  • 10
  • 26