Intuitive explanation of logloss

Question

In several kaggle competitions the scoring was based on "logloss". This relates to classification error.

Here is a technical answer but I am looking for an intuitive answer. I really liked the answers to this question about Mahalanobis distance, but PCA is not logloss.

I can use the value that my classification software puts out, but I don't really understand it. Why do we use it instead of true/false positive/negative rates? Can you help me so that I can explain this to my grandmother or a newbie in the field?

I also like and agree with the quote:

you do not really understand something unless you can explain it to your grandmother
-- Albert Einstein

I tried answering this on my own before posting here.

Links that I did not find intuitive or really helpful include:

These are informative, and accurate. They are meant for a technical audience. They do not draw a simple picture, or give a simple and accessible examples. They are not written for my grandmother.

https://www.quora.com/What-is-an-intuitive-explanation-for-the-log-loss-function — Ehsan M. Kermani, Apr 20 '16 at 18:05
@EhsanM.Kermani - I did not find those intuitive like the ones for Mahalanobis that I referenced. — EngrStudent, Apr 20 '16 at 18:06
the [entry on the kaggle website](https://www.kaggle.com/wiki/LogarithmicLoss) gives a pretty concise explanation of logloss — bdeonovic, Apr 20 '16 at 18:41
Found this link: http://www.exegetic.biz/blog/2015/12/making-sense-logarithmic-loss/ . Might help. — Deolu A, Nov 08 '16 at 14:44

score 9 · Accepted Answer · answered Apr 20 '16 at 20:01

9

Logloss is the logarithm of the product of all probabilities. Suppose Alice predicted:

with probability 0.2, John will kill Jack
with probability 0.001, Mary will marry John
with probability 0.01, Bill is a murderer.

It turned out that Mary did not marry John, Bill is not a murderer, but John killed Jack. The product of the probabilities, according to Alice, is 0.2*0.999*0.99=0.197802

Bob predicted:

with probability 0.5, John will kill Jack
with probability 0.5, Mary will marry John
with probability 0.5, Bill is a murderer.

The product is 0.5*0.5*0.5=0.125.

Alice is better predictor than Bob.

answered Apr 20 '16 at 20:01

user31264

1,694
10
14

1

why does "product of all probabilities" work? This sounds like a relative of expectation maximization. – EngrStudent Apr 20 '16 at 21:23
3

Do you need a formal proof? It is in the "technical answer" mentioned by the topicstarter. Do you need an informal "grandmother" reason why? You say: suppose this fellow gave correct predictions. What is the probability that everything happen as it really happened? This is the product of probabilities. – user31264 Apr 20 '16 at 22:55
1

"product of probabilities" isn't "grandma". log of product of probabilities is sum of log-probabilities, which they use in expectation maximization and call "expectation". I think it is also encoded in K-L divergence. ... I think in grandma-talk you could say" "most likely" = highest overall probability of multiple events. There are two get "highest": 1) maximize the combined probability or 2) minimize the negative combined probability. Most machine learning likes "gradient descent" or minimizing badness. Log-loss is the negative probability scaled by sample size, and it gets minimized. – EngrStudent Jun 30 '17 at 13:11
Here [link](https://stats.stackexchange.com/questions/113301/multi-class-logarithmic-loss-function-per-class) they say "exp(-loss) is average probability of correct prediction." – EngrStudent Jun 30 '17 at 13:16
I liked the Bishop ref [here](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html). It is equation 4.108 and is the cross-entropy error function. – EngrStudent Dec 08 '17 at 16:22

Intuitive explanation of logloss

1 Answers1