I have been going through the sklearn documentation but I am not able to understand the purpose of these functions in the context of logistic regression.
For decision_function
it says that its the distance between the hyperplane and the test instance. how is this particular information useful? and how does this relate to predict
and predict-proba
methods?

- 415
- 1
- 4
- 10
1 Answers
Recall that the functional form of logistic regression is
$$ f(x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k)}} $$
This is what is returned by predict_proba
.
The term inside the exponential
$$ d(x) = \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k $$
is what is returned by decision_function
. The "hyperplane" referred to in the documentation is
$$ \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k = 0 $$
This terminology is a holdover from support vector machines, which literally estimate a separating hyperplane. For logistic regression this hyperplane is a bit of an artificial construct, it is the plane of equal probability, where the model has determined both target classes are equally likely.
The predict
function returns a class decision using the rule
$$ f(x) > 0.5 $$
At the risk of soapboxing, the predict
function has very few legitimate uses, and I view using it as a sign of error when reviewing others work. I would go far enough to call it a design error in sklearn itself (the predict_proba
function should have been called predict
, and predict
should have been called predict_class
, if anything at all).

- 33,314
- 2
- 101
- 132
-
Thanks for the answer @Matthew, but can you clarify this point a bit more "For logistic regression, this hyperplane is a bit of an artificial construct, it is the plane of equal probability, where the model has determined both target classes are equally likely." ? – Sameed Feb 24 '18 at 15:02
-
This explanation is interesting and helpful. I wish sklearn explained it better. What I don't understand is what is the use of knowing the value of x in the logistic function 1/(1+e^-x)? All I can think of is to possibly use a different sigmoid function like x/(1+|x|). Is there more? thanks! – ldmtwo Apr 20 '18 at 17:28
-
Basically the decision function should have been the sigmoid in the logistic regression. Correct? – 3nomis Oct 03 '19 at 19:31
-
2I think the reason for @Matthew being on a soapbox is that using 0.5 as the threshold for prediction is naive. The first thing one should do is learn to use cross-validation, ROC curves and AUC to choose an appropriate threshold c, and using as the decision function f(x) > c. – hwrd Mar 04 '20 at 22:21
-
2@Mathew Drury. I computed ```1/(1+np.exp(-lr.decision_function(X))```, but the result does not match exactly to ```lr.predict_proba```. They are close, but not the same. Do you know why? – Sarah Aug 12 '20 at 16:50
-
@Sarah Looks like sklearn does not use exactly that function to transform into probabilities under the hood. Instead, it uses a generalized numerically stable version of softmax: https://github.com/scikit-learn/scikit-learn/blob/e217b68fd00bb7c54b81a492ee6f9db6498517fa/sklearn/utils/extmath.py#L595 – Matthew Drury Aug 12 '20 at 18:44
-
@Mathew Drury. But I checked the binary logistic regression, these two are not the same either. For the binary logistic regression, there should not be softmax function used. – Sarah Aug 12 '20 at 20:33
-
The source code does use the softmax, even in the cast of a binary logistic regression (well, as far as I can tell from reading the code). – Matthew Drury Aug 13 '20 at 01:25
-
Apparently, decision_function returns what's called "logits" in the deep learning literature. – gkcn Mar 10 '21 at 18:56
-
@MatthewDrury thanks - are you aware of any examples / things to read/search for in order to support the "soapboxing" at the end of your post? Would be appreciated. – baxx May 30 '21 at 20:32