sklearn logistic regression converging to unexpected coefficient for a simple case

Question

The case is as follows:

Suppose that

import numpy as np

X = np.array([1, 1, 1])
y = np.array([1, 0, 1])

Then I perform a logistic regression with no intercept to check out the fitted coefficient:

from sklearn.linear_model import LogisticRegression

def fit_predict(X, y, fit_intercept=True):
  model = LogisticRegression(fit_intercept=fit_intercept)
  model.fit(X.reshape(-1, 1), y)
  print(f'model coefficients: {model.coef_}')

fit_predict(X, y, fit_intercept=False)

# output: [[0.2865409]]

I am pretty confused by this output. According to my algebra (directly solving the optimization constraint), the coefficient should be $logit(2/3) \approx 0.6931471805599452$.

Is this because my math is wrong, or because something else is going on that I don't know about?

The algebra is as follows, starting with the following equation:

$$ \sum_i y_i \cdot x_i - sigmoid(x_i) \cdot x_i = 0$$

If we plug the values in, then $$2 = 3\cdot sigmoid(1)$$.

I conclude that $\beta = logit(2/3)$.

Thanks in advance.

Sklearn applies a penalty by default. I would see that penalty to 0 and recheck your estimates prior to continuing. — Demetri Pananos, Jul 01 '21 at 22:43
@Syllabear You can write an answer explaining how Demetri's hint helped you answer the question. — Sycorax, Jul 02 '21 at 03:27

score 15 · Answer 1 · answered Jul 02 '21 at 20:57

15

As Demetri suggested, we need to add penalty='none' for the code to give expected results.

The revised code is as follows:

from sklearn.linear_model import LogisticRegression

def fit_predict(X, y, fit_intercept=True):
  model = LogisticRegression(fit_intercept=fit_intercept, penalty='none')
  model.fit(X.reshape(-1, 1), y)
  print(f'model coefficients: {model.coef_}')

answered Jul 02 '21 at 20:57

Syllabear

393
1
11

4

Weird API that they went with the string `none`. – Matthew Drury Jul 02 '21 at 22:58
1

@MatthewDrury That is not the only weird thing. The penalty parameter (they call it `C`) is the inverse of the regularization strength so $C = 1/\lambda$. I imagine they did this to make grid searches for the optimal strength a little less cumbersome. – Demetri Pananos Jul 03 '21 at 00:35

score 10 · Accepted Answer · answered Jul 03 '21 at 00:32

I will add my own answer to this question in order to shine some light on why a penalty is added by default. I'm also posting for posterity as you are not the first person to get caught by this and you won't be the last.

Back in 2019, Zachary Lipton discovered sklearn applies the penalty by default too and this sparked a very intense debate on twitter and elsewhere. The long and the short of that discussion is that sklearn sees itself as a machine learning library first which in their eyes means that they favor other things over unbiasedness and estimates of effects. The most striking example of their philosophy (in my opinion) comes when Andreas Mueller plainly asks why someone would want an unbiased implementation of logistic regression. Inference simply isn't on their radar.

Hence, LogisticRegression is not the de jure Logistic Regression. It is a penalized variant thereof by default (and the default penalty doesn't even make any sense). There is another sharp point. Had you learned about penalized logistic regression a la ridge regression or the LASSO, you would be surprised to learn sklearn parameterizes the penalty parameter as the inverse of the regularization strength. Hence setting $\lambda=2$ in LASSO or Ridge would correspond to C=0.5 in LogisticRegression.

Let me sum up by making this completely unambiguous.

If you are intent on estimating the effects of some covariates on a binary outcome, and you insist on using python Do Not Use Sklearn. Use Statsmodels.

If however, you insist on using sklearn, remember that you need to set penalty='none' in the model instantiation step. Else, your estimates will be biased towards the null (by design).

Their preference for unbiased results here also uses the ML definition for 'bias'. That is different from the definition of 'bias' in (at least social science) statistical training. Hilariously this means that, under the typical (ss?) stats definition, the result is biased towards 0. — russellpierce, Jul 03 '21 at 11:05
You're right, thanks for the correction. Sloppy reading on my part. Got burnt once too often on that misunderstanding and turned into a raw nerve. — russellpierce, Jul 04 '21 at 10:40
Hi, is the MLE in logistic regression unbiased? Or is the asymptotic distribution just centered at the true coefficients? — user257566, Jul 04 '21 at 20:11
@user257566 Its been a while since I've taken a math stats course, but if I remember correctly the MLE may be biased but it is consistent (meaning as $n\to \infty$ we can estimate the parameter to arbitrary precision). — Demetri Pananos, Jul 05 '21 at 01:45
`sklearn` and `statsmodel` are not the only implementations and for instance it's easy to get more flexibility by implementing your own generative model [in `pytorch` for instance](https://laurentperrinet.github.io/sciblog/posts/2020-04-08-fitting-a-psychometric-curve-using-pytorch.html) — meduz, Jul 11 '21 at 08:36

sklearn logistic regression converging to unexpected coefficient for a simple case

2 Answers2