8

In machine learning, cross-entropy is often used while training a neural network.

During my training of my neural network, I track the accuracy and the cross entropy. The accuracy is pretty low, so I know that my network isn't performing well. But what can I say about my model knowing the cross-entropy?

Bob Burt
  • 545
  • 1
  • 6
  • 23

1 Answers1

12

Andrew Ng explains the intuition behind using cross-entropy as a cost function in his ML Coursera course under the logistic regression module, specifically at this point in time with the mathematical expression:

$$\text{Cost}\left(h_\theta(x),y\right)=\left\{ \begin{array}{l} -\log\left(h_\theta(x)\right) \quad \quad\quad \text{if $y =1$}\\ -\log\left(1 -h_\theta(x)\right) \quad \;\text{if $y =0$} \end{array} \right. $$

The idea is that with an activation function with values between zero and one (in this case a logistic sigmoid, but clearly applicable to, for instance, a softmax function in CNN, where the final output is a multinomial logistic), the cost in the case of a true 1 value ($y=1$), will decrease from infinity to zero as $h_\theta(x)\to1$, because ideally we would like for its to be $1$, predicting exactly the true value, and hence rewarding an activation output that gets close to it; reciprocally, the cost will tend to infinity as the activation function tends to $0$. The opposite is true for $y=0$ with the trick of obtaining the logarithm of $1-h_\theta(x)$, as opposed to $h_\theta(x).$


Here is my attempt at showing this graphically, as we limit these two functions between the vertical lines at $0$ and $1$, consistent with the output of a sigmoid function:

enter image description here


This can be summarized in one more succinct expression as:

$$\text{Cost}\left(h_\theta(x),y\right)=-y\log\left(h_\theta(x)\right)-(1-y) \log\left(1 - h_\theta(x)\right).$$

In the case of softmax in CNN, the cross-entropy would similarly be formulated as

$$\text{Cost}=-\sum_j \,t_j\,\log(y_j)$$

where $t_j$ stands for the target value of each class, and $y_j$ the probability assigned to it by the output.

Beyond the intuition, the introduction of cross entropy is meant to make the cost function convex.

user650654
  • 353
  • 2
  • 7
Antoni Parellada
  • 23,430
  • 15
  • 100
  • 197
  • 2
    To the beyond intuition part I would also add that cross entropy also emerges from maximum likelihood estimation for logistic regression model – Łukasz Grad Apr 11 '17 at 21:34
  • Is this only for binary case? .. what about above? – Bob Burt Apr 30 '17 at 03:30
  • @BobBurt I included a link to a page explaining the extrapolation to softmax and the cross entropy equations that follow, – Antoni Parellada Apr 30 '17 at 06:28
  • @BobBurt This answer explains the relationship between binary and multinomial cross-entropy. https://stats.stackexchange.com/questions/260505/machine-learning-should-i-use-a-categorical-cross-entropy-or-binary-cross-entro/260537#260537 – Sycorax Nov 16 '18 at 00:52
  • The second cost expression should be: $$\text{Cost}\left(h_\theta(x),y\right)=-y\log\left(h_\theta(x)\right)-(1-y)\log\left(1-h_\theta(x)\right).$$ – user650654 Jan 15 '20 at 07:12
  • @user650654 Thank you. Can you please edit the answer accordingly? – Antoni Parellada Jan 16 '20 at 12:45