Consider the three scoring rules in the case of a binary prediction:
- Log:
sum(log(ifelse(outcome, probability, 1-probability))) / n
- Brier:
sum((outcome-probability)**2) / n
- Sphere:
sum(ifelse(outcome, probability, 1-probability)/sqrt(probability**2+(1-probability)**2)) / n
What is the intuition behind them? When should I use one and not the other? I am especially interested in the case of low prevalence (e.g., 0.1%).
PS. This is to evaluate the results from my calibration algorithm which I asked about before.