This article provides a nice explanation and derivation of cross entropy, and defines it as follows:
$H(y, \hat{y}) = -\sum_i[ y_i \log \hat{y}_i]$ ....... (1)
where $y$ is the correct output and $\hat{y}$ is the network's predicted output.
Many neural network articles use this definition of cross entropy:
$Loss = -\sum_i[ y_i \log \hat{y}_i + (1-y_i) \log(1- \hat{y}_i))]$ ....... (2)
These equations are different, where does equation 2 come from? Is equation 2 a better loss function for neural networks?