Understanding cross entropy in neural networks

Question

This article provides a nice explanation and derivation of cross entropy, and defines it as follows:

$H(y, \hat{y}) = -\sum_i[ y_i \log \hat{y}_i]$ ....... (1)

where $y$ is the correct output and $\hat{y}$ is the network's predicted output.

Many neural network articles use this definition of cross entropy:

$Loss = -\sum_i[ y_i \log \hat{y}_i + (1-y_i) \log(1- \hat{y}_i))]$ ....... (2)

These equations are different, where does equation 2 come from? Is equation 2 a better loss function for neural networks?

sjw · Accepted Answer · 2017-08-09T13:01:01.017

2

If $i \in \{1,0\}$ these equations are identical. The second is commonly used for the case of a binary response whereas the first is a more general form used for $k$ classes.

edited Aug 09 '17 at 13:01

answered Aug 08 '17 at 14:08

sjw

5,091
1
21
45

Understanding cross entropy in neural networks

1 Answers1