Why maximum likelihood estimation is same with minimizing cross-entropy?

Question

Lots of articles say that MLE is as same as minimizing cross-entropy.

I tried to prove this but failed.

This article has the same problem, but I could not understand it.

$\,$

For example, I have several data points $X_i\,(i=1,..., N)$

These points are distributed as $X \sim P_{data}(X)$

Let, I want to approximate $P_{data}(X)$ with some parameters.

Let this approximated model is $P_{model}(X;\theta)$.

$\,$

First, I tried MLE.

$\theta^*=argmax\,\,\prod_{i=1}^{N}P_{model}(X_i;\theta)$

$\,\,\,\,\,\,\,=argmax\,\,\log (\prod_{i=1}^{N}P_{model}(X_i;\theta))$

$\,\,\,\,\,\,\,=argmax\,\, \sum_{i=1}^{N}\log (P_{model}(X_i;\theta))$

$\,$

Second, I tried minimizing cross-entropy.

$\theta^*=argmin\,H(P_{data}(X),P_{model}(X_i;\theta))$

$\,\,\,\,\,\,\,=argmin\,\, E_{X\sim P_{data}(X)} [-log(P_{model}(X_i;\theta))]$

$\,\,\,\,\,\,\,=argmax\,\, E_{X\sim P_{data}(X)} [log(P_{model}(X_i;\theta))]$

$\,\,\,\,\,\,\,=argmax\,\, \sum_{i=1}^{N} P_{data}(X_i)\log (P_{model}(X_i;\theta))$

$\,$

OK. Here I have the different result.

In cross-entropy, $P_{data}(X_i)$ is multiplied.

Why does this happen?

And also, how can be cross-entropy calculated?

Because we do not know $P_{data}(X_i)$ in general case.

I'm really curious about this. Kind explanation will be greatly appreciated.

I think that part of your confusion arises from the strange notation that you're using. If you work from the common notation of binomial MLE and binary cross-entropy, you end up with expressions that are identical up to a multiple of $-\frac{1}{n}$, which is what the duplicate thread shows. — Sycorax, Sep 07 '18 at 03:29
I'm quite a beginner in this field, so my notation is may strange. I feel sorry if it makes you hard to read. What I want to ask is, where does that '1/n' come from. It seems to assume every sample has constant distribution... — Jun, Sep 07 '18 at 04:44
The problem with the way you've used this notation is that nowhere does the actual label of the classes appear. That's a pretty big thing to be missing! — Sycorax, Sep 07 '18 at 14:49

0 Answers0