3

I am using a logistic regression estimator with scikit-learn. The estimator I had trained predicts the same class all the time. (This is a 2 class identification problem.)

The data set is built of 2 classes which are a bit skewed (70% output 1 and 30 output 0). It has about 2000 samples and is built of 8 features for each sample. when I am fitting for logistic regression I get only 1's on the output so the confusion matrix looks like that:

Confusion Matrix:
[[  0  53]
 [  0 155]]
  • I tried to play with the the regularization parameter (from 1e-6 to 1e8) and nothing changes.
  • The data base do not look linearly separable.
  • I would expect that the worst logistic regression estimator would at least yield P(y=1)=0.7 P(y=0)=0.3. Let me note that when using SVM with rbf kernel I get much better results. Below is the confusion matrix of the SVM:

    Confusion Matrix:
    [[ 48   5] 
    [ 15 140]]
    
  • Any idea why my logistic regression estimator is always predicting the same result?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
user2910088
  • 31
  • 1
  • 4
  • Not enough information to know why has been provided, and, this is an implementation problem, not a statistics problem. – Carl Apr 16 '17 at 10:36
  • 1
    We seem to get this question a lot. One thought is that if Python really does this, it implies that it's shoddy software & shouldn't be used except by people sophisticated enough to hack it & get around these sorts of things. – gung - Reinstate Monica Nov 21 '17 at 21:58

2 Answers2

9

Logistic regression is not a classifier. It predicts probabilities of $1$'s. For example, the intercept-only model

$$ E(Y) = g^{-1}(\beta_0) $$

where $g^{-1}$ is inverse of the logistic link function, would predict $\Pr(Y=1)=0.7$ for all cases. If you'd use decision rule "if probability is greater then $0.5$ then predict $1$", then you'd end up with classifying all-ones. So indeed the predicted probabilities would be correct, but the proportions of zeros and ones in the confusion matrix would not be the same as in $Y$. The same would be true for more complicated model. Saying it once again, logistic regression does not attempt to make correct classifications, but it estimates the conditional mean of $Y$.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • How do you know it's a null model? – SmallChess Apr 16 '17 at 09:24
  • @SmallChess this is how is called. – Tim Apr 16 '17 at 10:22
  • 5
    So the problem is that the software is using logistic regression inappropriately. Logistic regression is used to estimate probabilities and you should use proper accuracy scores. For details see http://www.fharrell.com/2017/01/classification-vs-prediction.html – Frank Harrell Apr 16 '17 at 12:22
  • @Tim My question is more about why you think the OP had a null model. I don't see it stated anywhere in the question? I can only see some weird problems in the model, but why it has to be a null model?? – SmallChess Apr 16 '17 at 13:12
  • 2
    @SmallChess I don't think Tim was saying that OP had a null model, he was saying that a null model would give a probability of 0.7 for each case and thus the confusion matrix would look the way OP's did. – Peter Flom Apr 16 '17 at 13:25
  • Thanks for the answer. I see what you are saying but it still feels like i am missing something. As i mentioned on the question i used a simple fit-predict procedure with scikit-learn. as i understand fitting basically fits the coefficients to minimize the cost function, i don't see why changing the decision rule would make a difference. I would expect that when fitting the estimator with a different rule the coefficients will change accordingly to minimize the cost function. – user2910088 Apr 16 '17 at 16:30
  • 1
    @user2910088 coefficients wouldn't change. Logistic regression finds the best coefficients to predict the probabilities of success. It is not optimized to make best classifications. The decision rule is applied to the results of logistic regression. – Tim Apr 16 '17 at 16:52
  • 1
    @user2910088 This is a mistake in sklearn. `predict` for classification models should not be used. – Matthew Drury Nov 21 '17 at 22:06
6

Adding to Tim's answer. The confusion matrix you got indicates that your model did not change the probability of any observation to less than 0.5, which is the default decision rule. If you can change that rule in the program, then you may get a better-looking confusion matrix. Check the documentation for how to do this.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276