9

I've been using logistic regression for a specific problem and the loss function the paper used is the following : $$ L(Y,\hat{Y})=\sum_{i=1}^{N} \log(1+\exp(-y_i\hat{y}_{i}))$$ Yesterday, I came accross Andrew Ng's course (Stanford notes) and he gave another loss function that was intuitive, according to his saying. The function was : $$J(\theta)=\frac{−1}{N}\sum_{i=1}^{N}y^{(i)}\log(h_\theta(x^{(i)}))+(1−y^{(i)})\log(1−h_\theta(x^{(i)}))$$ Now I know there isn't only ONE loss function per model and that both could be used.

My question is more about what separates those two functions ? Is there any advantage of working with one instead of the other ? Are they equivalent in any way ? thanks !

Sycorax
  • 76,417
  • 20
  • 189
  • 313
mlx
  • 281
  • 2
  • 7

2 Answers2

10

With the sigmoid function in logistic regression, these two loss functions are totally same, the main difference is that

  • $y_i\in\{-1,1\}$ is used in first loss function;
  • $y_i\in\{0,1\}$ is used in the second loss function.

Two loss functions can be derived by maximizing likelihood function.

Marks
  • 356
  • 2
  • 6
0

This is related to the choice of the labels, and each choice has (arguably) some advantages over the other. You should visit here for more detailed information on the topic.

Manuel Morales
  • 1,151
  • 8
  • 7