1

On my 4 variables logistic regression model, exhaustive grid search finds a much better solution than sklearn LogisticRegression.

Grid search delivers:

accuracy: 55.11%

log reg coefficients: 0.58  -0.19   -0.03   0.20   

Sklearn LogisticRegression delivers:

accuracy: 53.16%

log reg coefficients: 0.0015  -0.0010  0.0016  0.0002

Am I using sklearn the wrong way or does sklearn really suck? :O

Below is my code...

df = pandas.read_csv("samples.tsv", sep="\t")    

y = df.iloc[:,0].as_matrix()
x = df.iloc[:,1:].as_matrix()

logreg = sklearn.linear_model.LogisticRegression()
print("logreg:",logreg)

logreg.fit(x,y)
print("coefficients:", "\t".join(map(str,logreg.coef_[0])))
print("accuracy:", logreg.score(x,y))
elemolotiv
  • 1,048
  • 7
  • 20
  • 1
    What is "score"? Unless it's the likelihood function or something directly related to the likelihood function, logistic regression isn't optimizing the metric in the first place. – The Laconic Sep 23 '17 at 17:25
  • score = classification accuracy. In my problem the two classes are 50%-50% possible, so accuracy is a good performance measure. – elemolotiv Sep 23 '17 at 17:29
  • 5
    Logistic regression doesn't maximize classification accuracy. It doesn't even give you classifications, unless you make some additional assumption about what cutoff to use to assign observations to one class or another. – The Laconic Sep 23 '17 at 17:37
  • 3
    Check: https://stats.stackexchange.com/questions/127042/why-isnt-logistic-regression-called-logistic-classification/127044#127044 As others said, logistic regression does not maximize accuracy, it is not even a classifier. Compare both models under logistic loss, you should see that the LogisticRegression function does better (if it does not, it should worry you). – Tim Sep 23 '17 at 17:41
  • 3
    Its a fallacy 50/50 classes imply a .5 threshold is optimal. Setting aside whether accuracy is relevent to what problem you are solving, you still need to tune a threshold. – Matthew Drury Sep 23 '17 at 18:29

0 Answers0