Language Model compare probability scores between Length varying sentence

Question

My question is : How can I compare Language Model(LM) score for two sentences with different lengths ?

Probabilities are < 1, and since LM scores for a sentence are multiple of probability of bigram or trigram, depending upon it's a bigram or trigram model, the probability of scores of longer sentences will mostly be smaller.

So, how should I normalize the value of scores according to length ?

I am pretty sure, atmost everyone after reading LM would have had same doubt. But I couldn't find much on internet.

Would appreciate for any leads on this.

score 5 · Accepted Answer · edited May 25 '19 at 19:34

As you noticed, it's good idea to have some kind of averaging. Since in LM probabilities get multiplied, geometric average seems like a good fit.

From Speech and Language Processing

In practice we don’t use raw probability as our metric for evaluating language models, but a variant called perplexity. The perplexity (sometimes called PP for short) of a language model on a test set is the inverse probability of the test set, normalized by the number of words.

$PP((w_1, ...,w_N)) = \sqrt[N]{\dfrac{1}{P(w_1, ...,w_N)}}$

Language Model compare probability scores between Length varying sentence

1 Answers1