Questions tagged [cross-entropy]

A measure of the difference between two probability distributions for a given random variable or set of events.

In information theory, the cross-entropy between two probability distributions $p$ and $q$ over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution $q$, rather than the true distribution $p$.

The cross-entropy of the distribution $q$ relative to a distribution $p$ over a given set is defined as follows: $$ H(p,q)=-\mathbb{E}_p(\log q) $$ where $\mathbb{E}_p(\cdot)$ is the expected value operator with respect to the distribution $p$.

Source: Wikipedia.
Excerpt source: Brownlee "A Gentle Introduction to Cross-Entropy for Machine Learning" (2019).

230 questions

113

votes

6 answers

What loss function for multi-class, multi-label classification tasks in neural networks?

I'm training a neural network to classify a set of objects into n-classes. Each object can belong to multiple classes at the same time (multi-class, multi-label). I read that for multi-class problems it is generally recommended to use softmax and…

asked Apr 17 '16 at 14:28

aKzenT

1,231
2
8
5

votes

4 answers

What is the difference Cross-entropy and KL divergence?

Both the cross-entropy and the KL divergence are tools to measure the distance between two probability distributions, but what is the difference between them? $$ H(P,Q) = -\sum_x P(x)\log Q(x) $$ $$ KL(P | Q) = \sum_{x} P(x)\log {\frac{P(x)}{Q(x)}}…

entropy kullback-leibler cross-entropy

asked Jul 19 '18 at 13:02

yoyo

votes

4 answers

Should I use a categorical cross-entropy or binary cross-entropy loss for binary predictions?

First of all, I realized if I need to perform binary predictions, I have to create at least two classes through performing a one-hot-encoding. Is this correct? However, is binary cross-entropy only for predictions with only one class? If I were to…

machine-learning neural-networks loss-functions tensorflow cross-entropy

asked Feb 07 '17 at 15:02

infomin101

1,363
4
14
20

votes

5 answers

Backpropagation with Softmax / Cross Entropy

I'm trying to understand how backpropagation works for a softmax/cross-entropy output layer. The cross entropy error function is $$E(t,o)=-\sum_j t_j \log o_j$$ with $t$ and $o$ as the target and output at neuron $j$, respectively. The sum is over…

backpropagation derivative softmax cross-entropy

asked Sep 17 '16 at 23:32

micha

votes

4 answers

Cross Entropy vs. Sparse Cross Entropy: When to use one over the other

I am playing with convolutional neural networks using Keras+Tensorflow to classify categorical data. I have a choice of two loss functions: categorial_crossentropy and sparse_categorial_crossentropy. I have a good intuition about the…

machine-learning conv-neural-network loss-functions information-theory cross-entropy

asked Jan 31 '18 at 15:03

kedarps

2,902
2
19
30

votes

1 answer

Why do we use Kullback-Leibler divergence rather than cross entropy in the t-SNE objective function?

In my mind, KL divergence from sample distribution to true distribution is simply the difference between cross entropy and entropy. Why do we use cross entropy to be the cost function in many machine learning models, but use Kullback-Leibler…

kullback-leibler tsne cross-entropy

asked Mar 07 '17 at 13:26

JimSpark

votes

3 answers

Dice-coefficient loss function vs cross-entropy

When training a pixel segmentation neural network, such as a fully convolutional network, how do you make the decision to use the cross-entropy loss function versus Dice-coefficient loss function? I realize this is a short question, but not quite…

neural-networks loss-functions cross-entropy

asked Jan 04 '18 at 03:12

Christian

1,382
3
16
27

votes

2 answers

Why is mean squared error the cross-entropy between the empirical distribution and a Gaussian model?

In 5.5, Deep Learning (by Ian Goodfellow, Yoshua Bengio and Aaron Courville), it states that Any loss consisting of a negative log-likelihood is a cross-entropy between the empirical distribution defined by the training set and the probability…

machine-learning normal-distribution cross-entropy

asked Jul 02 '17 at 15:54

Mufei Li

votes

2 answers

Loss function for autoencoders

I am experimenting a bit autoencoders, and with tensorflow I created a model that tries to reconstruct the MNIST dataset. My network is very simple: X, e1, e2, d1, Y, where e1 and e2 are encoding layers, d2 and Y are decoding layers (and Y is the…

mse autoencoders tensorflow cross-entropy

asked Nov 11 '16 at 16:28

AkiRoss

votes

6 answers

Tensorflow Cross Entropy for Regression?

Does cross-entropy cost make sense in the context of regression? (as opposed to classification) If so, could you give a toy example through tensorflow and if not, why not? I was reading about cross entropy in Neural Networks and Deep Learning by…

regression entropy tensorflow cross-entropy

asked Jul 12 '16 at 00:08

JacKeown

votes

4 answers

How meaningful is the connection between MLE and cross entropy in deep learning?

I understand that given a set of $m$ independent observations $\mathbb{O}=\{\mathbf{o}^{(1)}, . . . , \mathbf{o}^{(m)}\}$ the Maximum Likelihood Estimator (or, equivalently, the MAP with flat/uniform prior) that identifies the parameters…

maximum-likelihood deep-learning cross-entropy

asked Aug 13 '17 at 19:03

orome

votes

2 answers

How to construct a cross-entropy loss for general regression targets?

It's common short-hand in neural networks literature to refer to categorical cross-entropy loss as simply "cross-entropy." However, this terminology is ambiguous because different probability distributions have different cross-entropy loss…

neural-networks maximum-likelihood loss-functions cross-entropy

asked Nov 22 '18 at 13:53

Sycorax

76,417
20
189
313

votes

2 answers

Different definitions of the cross entropy loss function

I started off learning about neural networks with the neuralnetworksanddeeplearning dot com tutorial. In particular in the 3rd chapter there is a section about the cross entropy function, and defines the cross entropy loss as: $C = -\frac{1}{n}…

neural-networks loss-functions softmax cross-entropy

asked Jul 14 '16 at 16:00

Reginald

votes

2 answers

Why we use log function for cross entropy?

I'm learning about a binary classifier. It uses the cross-entropy function as its loss function. $y_i \log p_i + (1-y_i) \log(1-p_i)$ But why does it use the log function? How about just use linear form as follows? $y_ip_i + (1-y_i)(1-p_i)$ Is there…

classification binary-data cross-entropy

asked Sep 11 '18 at 03:39

Viridisjun

votes

1 answer

the relationship between maximizing the likelihood and minimizing the cross-entropy

There is a statement that maximizing the likelihood is equivalent to minimizing the cross-entropy. Are there any proof for this statement?

machine-learning mathematical-statistics maximum-likelihood cross-entropy

asked Aug 27 '18 at 18:45

user3269

4,622
8
43
53

2 3

…

15 16 Next