1

I am a beginner in ML so apologize in advance if this sounds silly.

I did a logistic regression on a real data set and I am having problems measuring how well my model fits. I still don't understand how to apply the F1 score in my case.

After performing the error analysis on the cross validation set I got the following values:

Precision: 0.8642534
Recall: 0.8488889
Accuracy: 0.8222222
F1 score: 0.8565022

Are those good values? What do I compare them with? Is this a sign of a strong relationship between my predictor variables and the response variable?
The classes on the response variable are not skewed (I am predicting a gender). Any help ... I would much appreciate.
Thanks a lot

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
Larry
  • 43
  • 1
  • 1
  • 5
  • 1
    I edited your tags. The question has nothing to do with machine learning. – Frank Harrell Dec 27 '14 at 14:46
  • 1
    My question is taken from Prof Andrew Ng course on machine learning. – Larry Dec 28 '14 at 08:38
  • Logistic regression is a competitor of machine learning and is a precursor to some ML methods but is not ML when used the standard way. And I hope Prof Ng's notes discussed proper scoring rules. – Frank Harrell Dec 28 '14 at 13:21

2 Answers2

4

Logistic regression is not a classifier. It is a probability estimator. Any classification that you do is completely outside the scope of logistic modeling. Some good ways to judge the quality of predictions from logistic regression include high-resolution nonparametric calibration plots, Brier score, and $c$-index (concordance probability; ROC area). The R rms package's lrm, calibrate, validate functions make these easy to do, and calibrate and validate correct for overfitting (bias/optimism) using the bootstrap (by default) or cross-validation.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
  • How about using Somers' D to judge the quality of predictions? – yuqian Jul 20 '16 at 18:32
  • That is a good pure discrimination measure - a simple translation of the $c$-index (corcordance probability; AUROC). The Brier score on measures that are functions of the log-likelihood are the most sensitive ones. A smooth overfitting-corrected nonparametric calibration curve can be important to construct. – Frank Harrell Jul 20 '16 at 22:46
0

Logistic regression is a binary classifier (edit: can be used as a binary classifier), and therefore you can use the standard metrics for classifiers. The metrics you use are the standard ones, F1 being the most complete (though the less intuitive in terms of meaning).

Accuracy, for instance, is the percentage of points that have been correctly classified.

Your model seems not bad at all. A random classifier (a monkey doing random guesses) would get an accuracy of a 50%. Still, it depends on the dataset. In an easy dataset many basic models would be able to get similar (or best) values than yours.

You can compare your model to others to see whether your model gets remarkable results or it is just that the classification task was easy.

To complete the evaluation you can also plot a confusion matrix to see whether your model tends to do a good job in classifying one class but not as good in detecting members of the other class. In a confusion matrix, you want most of the elements to fall in the diagonal (true positive and true negatives). The other two blocks in a confusion matrix of a binary classifier are the false negative and false positives.

Besides, you should do all this in a test set, a subset of your data that your model has never seen (that is, that you didn't use to tune the coefficients of your regression).

alberto
  • 2,646
  • 16
  • 36
  • 1
    (-1) Logistic regression is *not* a binary classifier. See my answer here. http://stats.stackexchange.com/questions/127042/why-isnt-logistic-regression-called-logistic-classification/127044#127044 – Sycorax Dec 27 '14 at 15:04
  • Touché. Still, "it can be used as" and therefore the above metrics are valid. – alberto Dec 27 '14 at 18:14
  • 1
    No. These are improper scoring rules. – Frank Harrell Dec 27 '14 at 18:36
  • @Frank, even for out-of-sample tests, specially when you want to compare it with a completely different model (trees, SVMs...)? – alberto Dec 28 '14 at 01:50
  • An improper accuracy scoring rule fails for both in-sample and out-of-sample application. – Frank Harrell Dec 28 '14 at 02:47