Perplexity and cross-entropy for n-gram models

Question

Trying to understand the relationship between cross-entropy and perplexity. In general for a model M, Perplexity(M)=2^entropy(M) . Does this relationship hold for all different n-grams, i.e. unigram, bigram etc.?

That's actually the *definition* of perplexity; the $\sqrt[N]{\Pi^N_{i=1} \frac{1}{P(w_i|w_1, ... w_{i-1})}}$ thing is derived from it ;) — WavesWashSands, Jun 19 '17 at 06:32

score 12 · Answer 1 · answered Jun 18 '17 at 21:17

Yes, the perplexity is always equal to two to the power of the entropy. It doesn't matter what type of model you have, n-gram, unigram, or neural network.

There are a few reasons why language modeling people like perplexity instead of just using entropy. One is that, because of the exponent, improvements in perplexity "feel" like they are more substantial than the equivalent improvement in entropy. Another is that before they started using perplexity, the complexity of a language model was reported using a simplistic branching factor measurement that is more similar to perplexity than it is to entropy.

score 2 · Answer 2 · answered Dec 01 '18 at 09:56

2

Agreed with the @Aaron answer with a slight modification:

It's not always equal to two to the power of the entropy. Actually, It will be (base for log) to the power of entropy. If you have used e as your base then it would be e^entropy.

answered Dec 01 '18 at 09:56

Prashant Gupta

121
3

Perplexity and cross-entropy for n-gram models

2 Answers2