As we know , we have two kinds of presentation in binary classification , one is 0/1 and the other is -1/1 .
- For 0/1 case , we often use "negative logarithmic likelihood" loss function for it , also known as cross entropy function , certainly other options such as "hinge" loss also can also be in consideration . Here , we just consider the former one . And the formula should be like : $$L(w) = -\frac{1}{N}\sum_{i = 1}^{N}log(p_{i}^{y_{i}}*(1 - p_{i})^{1 - y_{i}}) + regPart$$ while both p(i) and regPart should be function of w .
- As in -1/1 case , we often use "exponential" loss function for it , and its formula should be like : $$L(w) = log(1 + \exp^{-y_{i} * s_{i}}) + regPart$$ while both s(i) and regPart should be function of w .
My model behaved in a strange way while training in an experimental environment few days ago .
I used logistic regression to solve binary classification problem , and tried these two types of loss function showed above . The metrics on validate data while training are listed below :
- In the case of logarithmic loss for 0/1 , the accuracy values on validate data were
0.48 0.57 0.68 0.74 0.76 0.78 0.78 0.78 0.79 0.79 0.80 0.80 0.81
It seemed normal same as what I expected earlier .
- While in the case of exponential loss for -1/1 , the accuracy values on validate data were
0.48 0.34 0.32 0.30 0.28 0.26 0.25 0.24 0.23 0.23 0.23 0.23 0.23
Strangely , the ratio of positives in validate data is 0.23 .
It seemed to go in the wrong direction against with what I expected . I checked the parameters in each iteration , and found out that each dimensional parameter went up while iteration went .
Do I have to choose the logarithmic loss function for logistic regression ? And why ?
Much appreciated if you can give me some constructive tips or detailed explanations for that -)