How do I choose the right loss function , logarithmic loss for 0/1 or exponential loss for -1/1?

Question

As we know , we have two kinds of presentation in binary classification , one is 0/1 and the other is -1/1 .

For 0/1 case , we often use "negative logarithmic likelihood" loss function for it , also known as cross entropy function , certainly other options such as "hinge" loss also can also be in consideration . Here , we just consider the former one . And the formula should be like : $$L(w) = -\frac{1}{N}\sum_{i = 1}^{N}log(p_{i}^{y_{i}}*(1 - p_{i})^{1 - y_{i}}) + regPart$$ while both p(i) and regPart should be function of w .
As in -1/1 case , we often use "exponential" loss function for it , and its formula should be like : $$L(w) = log(1 + \exp^{-y_{i} * s_{i}}) + regPart$$ while both s(i) and regPart should be function of w .

My model behaved in a strange way while training in an experimental environment few days ago .

I used logistic regression to solve binary classification problem , and tried these two types of loss function showed above . The metrics on validate data while training are listed below :

In the case of logarithmic loss for 0/1 , the accuracy values on validate data were

0.48 0.57 0.68 0.74 0.76 0.78 0.78 0.78 0.79 0.79 0.80 0.80 0.81

It seemed normal same as what I expected earlier .

While in the case of exponential loss for -1/1 , the accuracy values on validate data were

0.48 0.34 0.32 0.30 0.28 0.26 0.25 0.24 0.23 0.23 0.23 0.23 0.23

Strangely , the ratio of positives in validate data is 0.23 .

It seemed to go in the wrong direction against with what I expected . I checked the parameters in each iteration , and found out that each dimensional parameter went up while iteration went .

Do I have to choose the logarithmic loss function for logistic regression ? And why ?

Much appreciated if you can give me some constructive tips or detailed explanations for that -)

On the other side , for -1/+1 target label , is $$log(1 + exp(-y_{i} * s_{i}))$$ the only choose for loss function ? — joe, Dec 17 '16 at 15:59

score 1 · Answer 1 · answered Dec 17 '16 at 15:05

No, you don't ever have to use the logarithmic loss function. The logarithmic loss function is just a popular choice because maximizing it (or minimizing the negative of it) will lead to a maximum likelihood solution. Estimators based on this principle generally have nice properties. It's often possible to derive analytic formulas for maximum likelihood estimators. However, in the case of the logistic regression model, no such formulas exist.

There are alternatives, however. The exponential loss function you referenced could be one such choice - although I'm not sure I understand your definition of it. Aside from log-loss, probably the most popular choice for binary data is quadratic loss which is used in the Brier scoring rule. In your case, it would take this simplified form: $$L(w) = \frac{1}{N}\sum_{i = 1}^{N}(p_{i}-y_{i})^2$$

Minimizing either the logarithmic scoring rule or the Brier scoring rule should give you very similar $p_i$ estimates when you use logistic regression.

Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/50233/discussion-on-answer-by-jjet-how-do-i-choose-the-right-loss-function-logarithm). — whuber, Dec 17 '16 at 17:38

How do I choose the right loss function , logarithmic loss for 0/1 or exponential loss for -1/1?

1 Answers1