56

I am trying to understand why the output from logistic regression of these two libraries gives different results.

I am using the dataset from UCLA idre tutorial, predicting admit based on gre, gpa and rank. rank is treated as categorical variable, so it is first converted to dummy variable with rank_1 dropped. An intercept column is also added.

py
from patsy import dmatrices
from sklearn.linear_model import LogisticRegression
import pandas as pd
import statsmodels.api as sm

df = pd.read_csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
y, X = dmatrices('admit ~ gre + gpa + C(rank)', df, return_type = 'dataframe')
X.head()
>  Intercept  C(rank)[T.2]  C(rank)[T.3]  C(rank)[T.4]  gre   gpa
0          1             0             1             0  380  3.61
1          1             0             1             0  660  3.67
2          1             0             0             0  800  4.00
3          1             0             0             1  640  3.19
4          1             0             0             1  520  2.93

# Output from scikit-learn
model = LogisticRegression(fit_intercept = False)
mdl = model.fit(X, y)
model.coef_
> array([[-1.35417783, -0.71628751, -1.26038726, -1.49762706,  0.00169198,
     0.13992661]]) 
# corresponding to predictors [Intercept, rank_2, rank_3, rank_4, gre, gpa]

# Output from statsmodels
logit = sm.Logit(y, X)
logit.fit().params
> Optimization terminated successfully.
     Current function value: 0.573147
     Iterations 6
Intercept      -3.989979
C(rank)[T.2]   -0.675443
C(rank)[T.3]   -1.340204
C(rank)[T.4]   -1.551464
gre             0.002264
gpa             0.804038
dtype: float64

The output from statsmodels is the same as shown on the idre website, but I am not sure why scikit-learn produces a different set of coefficients. Does it minimize some different loss function? Is there any documentation that states the implementation?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
hurrikale
  • 853
  • 1
  • 8
  • 7
  • The results with leaving the constant term out won't reproduce the Scikit results either, since I checked it. It must be the regularization. –  Mar 01 '20 at 17:23

3 Answers3

55

Your clue to figuring this out should be that the parameter estimates from the scikit-learn estimation are uniformly smaller in magnitude than the statsmodels counterpart. This might lead you to believe that scikit-learn applies some kind of parameter regularization. You can confirm this by reading the scikit-learn documentation.

There is no way to switch off regularization in scikit-learn, but you can make it ineffective by setting the tuning parameter C to a large number. Here is how that works in your case:

# module imports
from patsy import dmatrices
import pandas as pd
from sklearn.linear_model import LogisticRegression
import statsmodels.discrete.discrete_model as sm

# read in the data & create matrices
df = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
y, X = dmatrices('admit ~ gre + gpa + C(rank)', df, return_type = 'dataframe')

# sklearn output
model = LogisticRegression(fit_intercept = False, C = 1e9)
mdl = model.fit(X, y)
model.coef_

# sm
logit = sm.Logit(y, X)
logit.fit().params

UPDATE: As correctly pointed out in the comments below, now you can switch off the relularization in scikit-learn by setting penalty='none' (see the docs).

desertnaut
  • 278
  • 2
  • 10
tchakravarty
  • 8,442
  • 2
  • 36
  • 50
  • Thank you very much for the explanation! With this regularized result, I was trying to duplicate the result using the `glmnet` package in R, but could not get the same coefficient. [glmnet](https://cran.r-project.org/web/packages/glmnet/vignettes/glmnet_beta.html#log) has a slightly different cost function comparing to [sklearn](http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression), but even if I set `alpha=0` in `glmnet` (meaning only use l2-penalty) and set `1/(N*lambda)=C`, I still do not get the same result? – hurrikale Mar 26 '16 at 13:04
  • My intuition is that if I divide both terms of the cost function in `glmnet` by `lambda` and set the new constant in font of the log-likelihood, which is `1/(N*lambda)` equal to that in `sklearn`, the two cost functions become identical, or am I missing something? – hurrikale Mar 26 '16 at 13:06
  • @hurrikale Ask a new question and link it here, and I will take a look. – tchakravarty Mar 26 '16 at 13:11
  • Thanks! I posted the question [here](http://stats.stackexchange.com/questions/203816/logistic-regression-scikit-learn-vs-glmnet). – hurrikale Mar 26 '16 at 13:55
  • 3
    I think the best way to switch off the regularization in scikit-learn is by setting `penalty='none'`. – Nzbuu Jul 17 '19 at 15:30
  • As Nzbuu points out, the newest version of sklearn finally does allow to completely switch off regularization. – Ben Reiniger Aug 29 '19 at 20:40
  • Now above-mentioned dataset could be found at – banderlog013 Apr 28 '20 at 11:03
  • Does the addition of `penalty='none'` mean that the documentation [here](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression) is outdated, since it only mentions the "setting C to a very high value" way of switching off regularization? – Quinn Culver May 04 '21 at 19:28
10

What tripped me up:

  • disable sklearn regularization LogisticRegression(C=1e9)

  • add statsmodels intercept sm.Logit(y,sm.add_constant(X)) OR disable sklearn intercept LogisticRegression(C=1e9,fit_intercept=False)

  • sklearn returns probability for each class so model_sklearn.predict_proba(X)[:,1] == model_statsmodel.predict(X)

  • Use of predict fucntion model_sklearn.predict(X) == (model_statsmodel.predict(X)>0.5).astype(int)

I'm now seeing the same results in both libraries.

citynorman
  • 201
  • 2
  • 3
1

Another difference is that you've set fit_intercept=False, which effectively is a different model. You can see that Statsmodel includes the intercept. Not having an intercept surely changes the expected weights on the features. Try the following and see how it compares:

model = LogisticRegression(C=1e9)

  • 3
    It is the exact opposite actually - statsmodels does *not* include the intercept by default. See the SO threads [Coefficients for Logistic Regression scikit-learn vs statsmodels](https://stackoverflow.com/questions/62005911/coefficients-for-logistic-regression-scikit-learn-vs-statsmodels) and [scikit-learn & statsmodels - which R-squared is correct?](https://stackoverflow.com/questions/54614157/scikit-learn-statsmodels-which-r-squared-is-correct), as well as the answer below. – desertnaut May 26 '20 at 12:44
  • @desertnaut you're right statsmodels doesn't include the intercept by default. Here the design matrix `X` returned by `dmatrices` includes a constant column of 1's (see output of `X.head()`). Then even though both the scikit and statsmodels estimators are fit with no explicit instruction for an intercept (the former through `intercept=False`, the latter by default) both models effectively have an intercept, which can be seen by inspecting the outputs carefully. – rmwenzel Jan 04 '21 at 02:29