19

I have been going through the sklearn documentation but I am not able to understand the purpose of these functions in the context of logistic regression. For decision_function it says that its the distance between the hyperplane and the test instance. how is this particular information useful? and how does this relate to predict and predict-proba methods?

Sameed
  • 415
  • 1
  • 4
  • 10

1 Answers1

45

Recall that the functional form of logistic regression is

$$ f(x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k)}} $$

This is what is returned by predict_proba.

The term inside the exponential

$$ d(x) = \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k $$

is what is returned by decision_function. The "hyperplane" referred to in the documentation is

$$ \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k = 0 $$

This terminology is a holdover from support vector machines, which literally estimate a separating hyperplane. For logistic regression this hyperplane is a bit of an artificial construct, it is the plane of equal probability, where the model has determined both target classes are equally likely.

The predict function returns a class decision using the rule

$$ f(x) > 0.5 $$

At the risk of soapboxing, the predict function has very few legitimate uses, and I view using it as a sign of error when reviewing others work. I would go far enough to call it a design error in sklearn itself (the predict_proba function should have been called predict, and predict should have been called predict_class, if anything at all).

Matthew Drury
  • 33,314
  • 2
  • 101
  • 132
  • Thanks for the answer @Matthew, but can you clarify this point a bit more "For logistic regression, this hyperplane is a bit of an artificial construct, it is the plane of equal probability, where the model has determined both target classes are equally likely." ? – Sameed Feb 24 '18 at 15:02
  • This explanation is interesting and helpful. I wish sklearn explained it better. What I don't understand is what is the use of knowing the value of x in the logistic function 1/(1+e^-x)? All I can think of is to possibly use a different sigmoid function like x/(1+|x|). Is there more? thanks! – ldmtwo Apr 20 '18 at 17:28
  • Basically the decision function should have been the sigmoid in the logistic regression. Correct? – 3nomis Oct 03 '19 at 19:31
  • 2
    I think the reason for @Matthew being on a soapbox is that using 0.5 as the threshold for prediction is naive. The first thing one should do is learn to use cross-validation, ROC curves and AUC to choose an appropriate threshold c, and using as the decision function f(x) > c. – hwrd Mar 04 '20 at 22:21
  • 2
    @Mathew Drury. I computed ```1/(1+np.exp(-lr.decision_function(X))```, but the result does not match exactly to ```lr.predict_proba```. They are close, but not the same. Do you know why? – Sarah Aug 12 '20 at 16:50
  • @Sarah Looks like sklearn does not use exactly that function to transform into probabilities under the hood. Instead, it uses a generalized numerically stable version of softmax: https://github.com/scikit-learn/scikit-learn/blob/e217b68fd00bb7c54b81a492ee6f9db6498517fa/sklearn/utils/extmath.py#L595 – Matthew Drury Aug 12 '20 at 18:44
  • @Mathew Drury. But I checked the binary logistic regression, these two are not the same either. For the binary logistic regression, there should not be softmax function used. – Sarah Aug 12 '20 at 20:33
  • The source code does use the softmax, even in the cast of a binary logistic regression (well, as far as I can tell from reading the code). – Matthew Drury Aug 13 '20 at 01:25
  • Apparently, decision_function returns what's called "logits" in the deep learning literature. – gkcn Mar 10 '21 at 18:56
  • @MatthewDrury thanks - are you aware of any examples / things to read/search for in order to support the "soapboxing" at the end of your post? Would be appreciated. – baxx May 30 '21 at 20:32