1

In logistic regression, the binary cross-entropy (logistic loss function) is defined as $$\ell (\boldsymbol{y}, \boldsymbol{\hat{y}}) = - \sum_{i=1}^n y_i \log \hat{y}_i + (1-y_i) \log (1-\hat{y}_i).$$

I wonder why, the researchers do not report cross-entropy values computed on a test set in research papers. This can be a measure of the goodness of the estimator.

I would like to report cross-entropy, false-positive rate, false-negative rate and F-score (harmonic mean of precision and recall) computed from a test set.

Is there anything logically problematic in my case?

mert
  • 263
  • 2
  • 9
  • Do they even report it for the training set? // This question is going to lead you to something called "(strictly) proper scoring rules". – Dave Jun 10 '21 at 19:44
  • They report it neither for training set nor test set. I think that it can be convenient to report it since the cross-entropy on a test set is relevant to MSE on a test set in a linear regression case, and reporting test MSE in linear regression is very common. (Or, am I wrong?) – mert Jun 10 '21 at 19:59
  • The issue you're encountering is that, in a regression problem, most people optimize the loss function of interest, often MSE (or something equivalent like SSE, RMSE, or $-R^2$, all of which are equivalent as loss functions). In a classification problem, people often have an interest in the classification accuracy: how many times I call a dog picture a dog and a cat picture a cat. That requires a decision based on the probability output, often just rounding the probability to $0$ or $1$. Such an accuracy metric is an improper scoring rule and is less useful than many realize. – Dave Jun 10 '21 at 20:03
  • 1
    [A question of mine](https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email) might be worth reading. The answer in there has links to other good material on proper scoring rules and probabilistic predictions. Frank Harrell (Vanderbilt professor) has two good blog posts on this topic, too. [1](https://www.fharrell.com/post/classification/) [2](https://www.fharrell.com/post/class-damage/) – Dave Jun 10 '21 at 20:04
  • I agree with you: it is less useful. However, I have a situation in which two (sparse) models based on high-dimensional data have very close classification results, but one is better than the other in terms of cross-entropy. Thank you for your comments. – mert Jun 10 '21 at 20:07

0 Answers0