For classification problems, while the "real" loss function is often the 0-1 loss, we often choose a surrogate loss function that makes the learning easy. It is often the case that the loss function for training is differentiable and thus can work with gradient descent, etc.
My question is that when I am doing cross-validation to pick the hyperparameters, should I use the surrogate loss function used by training or the real loss function?
How about testing?