11

Trying to understand the relationship between cross-entropy and perplexity. In general for a model M, Perplexity(M)=2^entropy(M) . Does this relationship hold for all different n-grams, i.e. unigram, bigram etc.?

Margalit
  • 111
  • 1
  • 4
  • 1
    That's actually the *definition* of perplexity; the $\sqrt[N]{\Pi^N_{i=1} \frac{1}{P(w_i|w_1, ... w_{i-1})}}$ thing is derived from it ;) – WavesWashSands Jun 19 '17 at 06:32

2 Answers2

12

Yes, the perplexity is always equal to two to the power of the entropy. It doesn't matter what type of model you have, n-gram, unigram, or neural network.

There are a few reasons why language modeling people like perplexity instead of just using entropy. One is that, because of the exponent, improvements in perplexity "feel" like they are more substantial than the equivalent improvement in entropy. Another is that before they started using perplexity, the complexity of a language model was reported using a simplistic branching factor measurement that is more similar to perplexity than it is to entropy.

Aaron
  • 3,025
  • 14
  • 24
2

Agreed with the @Aaron answer with a slight modification:

It's not always equal to two to the power of the entropy. Actually, It will be (base for log) to the power of entropy. If you have used e as your base then it would be e^entropy.