Word perplexity on a subword language model

Question

Let's have corpora

$X = x_1...x_N$

in which every word can be represented using subwords (from a fixed size vocabulary of subwords)

$x_i = x_{i,0}...x_{i,M(x_i)}$

where $M(x_i)$ is number of subwords to which the word is divided.

For a word language model we would calculate perplexity using formula:

$\exp\left(\frac{{\sum_{i=1}^N \log \left(\dfrac{1}{q(x_i)}\right)}}{N}\right)$

where $q(x_i)$ is probability of a word from language model.

My questions are:

Is it valid to calculate word perplexity on a subword language model, where $q'(x_i) $ would equal to $\prod_{j=1}^{M(x_i)}r(x_{i,j})$ and $r(x_{i,j})$ is probability from subword language model. The whole formula would look like this:

$\exp\left(\frac{{\sum_{i=1}^N \log \left(\dfrac{1}{\prod_{j=1}^{M(x_i)}r(x_{i,j})}\right)}}{N}\right) = \exp\left(\frac{{\sum_{i=1}^N \sum_{j=1}^{M(x_i)}\log \left(\dfrac{1}{r(x_{i,j})}\right)}}{N}\right)$

If not is there other way to calculate word perplexity on subword model and do two language models with different vocabulary can be even compared?
The probability is actually conditional probability $q(x_i|x_{0...i-1})$, does it change something in that manner?

Lerner Zhang · Answer 1 · 2020-02-05T00:32:47.393

0

No, I thought you don't need to multiply the probabilities of the subwords of a word, and all you need to do is just treat each subword as a word, and hence the formula is OK if you change the $x_i$ in your first formula to $x_i\in \{generated\_subwords\}$.

References:

edited Feb 05 '20 at 00:32

answered Feb 03 '20 at 13:59

Lerner Zhang

5,017
1
31
52

Word perplexity on a subword language model

1 Answers1