2

For classification problems, while the "real" loss function is often the 0-1 loss, we often choose a surrogate loss function that makes the learning easy. It is often the case that the loss function for training is differentiable and thus can work with gradient descent, etc.

My question is that when I am doing cross-validation to pick the hyperparameters, should I use the surrogate loss function used by training or the real loss function?

How about testing?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Tom Bennett
  • 677
  • 4
  • 15
  • 1
    Use the real loss function. Altough some surrogate loss functions guarantee convergence to the best possible mode parametersl under correct model specification, the specification is almost never correct (unless it is a known physical phenomenon or a toy example) and data is almost never enough for convergence. – Cagdas Ozgenc Sep 07 '20 at 06:16
  • Should I use the real loss function even for cross-validation? – Tom Bennett Sep 07 '20 at 06:19
  • Statisticians discourage 0-1 accuracy loss. https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ https://stats.stackexchange.com/a/359936/247274 https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email – Dave Mar 04 '21 at 12:02
  • What qualifies a loss function to be called "real"? If in reality you are interested not only in correct classification but also in information about how uncertain that classification is, 0-1 loss may not be so "real" in my view. (Also even given "true" class labels may in reality be somewhat uncertain.) – Christian Hennig Mar 04 '21 at 13:21

0 Answers0