I have a question about choosing the right scoring rule. I am building a system which predicts the spatial (2D) probability of an event. The label data contains continuous values between 0 and 1, indicating the probability of the event for each pixel. I constructed my NN and applied a sigmoid at the end to pull the values between 0 and 1. I started using the Brier-score (MSE) as loss function but after pondering a bit about it, it doesn't seem like a 'fair' scoring system in my case.
Take a scenario with two pixels, one pixel has a correct label of 0.5 and the other of 1. Imagine that we predict 0 for the first pixel and for the second 0.5, both will have the same Brier-score of (0.5-1)^2. However, I feel that the first case is actually worse than the second. The second prediction is far from correct, but it's halfway in between wrong and correct, it could have performed much worse (here we have equaled the average error of an untrained model). In the first case we have made the worst prediction we could possibly make (here we did much worse than what untrained model would on average predict). Intuitively, the first prediction, which was the worst prediction possible, should have a worse score than the prediction that was halfway in-between of the second pixel.
The second function I was considering was log-loss. However, since we're working with continuous values, taking P(x) seems like a bad operation to do as every P(x) will be very small.
I have searched the relevant topics for existing posts but haven't found a satisfying answer (or I didn't understand it).
What would be a 'good' scoring rule in this case?