Questions tagged [perplexity]

Perplexity is a measure uses to evaluate how well a probability model predicts a given test set. It is closely related to cross-entropy.

Perplexity is a measure uses to evaluate how well a probability model predicts a given test set. It is closely related to cross-entropy. It is commonly use to evaluate language models.

What is perplexity?

External Resources:

21 questions
53
votes
4 answers

What is perplexity?

I came across term perplexity which refers to the log-averaged inverse probability on unseen data. Wikipedia article on perplexity does not give an intuitive meaning for the same. This perplexity measure was used in pLSA paper. Can anyone explain…
Learner
  • 4,007
  • 11
  • 37
  • 39
11
votes
2 answers

Perplexity and cross-entropy for n-gram models

Trying to understand the relationship between cross-entropy and perplexity. In general for a model M, Perplexity(M)=2^entropy(M) . Does this relationship hold for all different n-grams, i.e. unigram, bigram etc.?
Margalit
  • 111
  • 1
  • 4
7
votes
2 answers

Why does larger perplexity tend to produce clearer clusters in t-SNE?

Why does larger perplexity tend to produce clearer clusters in t-SNE? By reading the original paper, I learned that the perplexity in t-SNE is $2$ to the power of Shannon entropy of the conditional distribution induced by a data point. And it is…
meTchaikovsky
  • 1,414
  • 1
  • 9
  • 23
7
votes
2 answers

Intuition behind perplexity parameter in t-SNE

While reading Laurens van der Maaten's paper about t-SNE we can encounter the following statement about perplexity: The perplexity can be interpreted as a smooth measure of the effective number of neighbors. The performance of SNE is fairly robust…
Kuba_
  • 171
  • 4
4
votes
1 answer

Inferring the number of topics for gensim's LDA - perplexity, CM, AIC, and BIC

I am confused as to how to interpret the LDA's perplexity fluctuations with different numbers of topics, in the endeavour of determining the best number of topics. Additionally, I would like to know how to implement AIC/BIC with gensim LDA models. I…
4
votes
1 answer

Language Model compare probability scores between Length varying sentence

My question is : How can I compare Language Model(LM) score for two sentences with different lengths ? Probabilities are < 1, and since LM scores for a sentence are multiple of probability of bigram or trigram, depending upon it's a bigram or…
3
votes
1 answer

Breaking substitution cipher with language model

Frequency analysis is a common tool used to break substitution ciphers, but often relies on intuition and guesswork of a human. Since language models can objectively calculate perplexity (how surprising a piece of language seems), they seem like a…
2
votes
1 answer

Calculating perplexity with smoothing techniques (NLP)

This question is about smoothed n-gram language models. When we use additive smoothing on the train set to determine the conditional probabilities, and calculate the perplexity of train data, where exactly is this useful when it comes to the test…
2
votes
1 answer

Perplexity formula in the t-SNE paper vs. in the implementation

The perplexity formula in the official paper of t-SNE IS NOT the same as in its implementation. In the implementation (MATLAB): % Function that computes the Gaussian kernel values given a vector of % squared Euclidean distances, and the precision of…
2
votes
1 answer

Why do I get weird results when using high perpexity in t-SNE?

I played around with the t-SNE implementation in scikit-learn and found that increasing perplexity seemed to always result in a torus/circle. I couldn't find any mentions about this in literature. Check out some examples below, which is just a…
2
votes
1 answer

How should perplexity of LDA behave as value of the latent variable k increases?

When increasing the value of the latent variable k for LDA (latent Dirichlet allocation), how should perplexity behave: On the training set? On the testing set?
1
vote
0 answers

Perplexity for short sentences

I have a model that outputs short sentences and want to compare the quality of its outputs for different configurations by computing their perplexities using another model. I tried to use the gpt-2 model from…
dj_rydu
  • 11
  • 3
1
vote
0 answers

Why does perplexity change with different ranges of k?

I ran a 5-fold cross-validation in R to calculate LDA perplexity for k = 2:9 using a 10% sample of my data. The output was: 2 3 4 5 6 7 8 9 156277 139378 71659 68998 67471 32890 32711 31904 I re-ran…
1
vote
1 answer

Word perplexity on a subword language model

Let's have corpora $X = x_1...x_N$ in which every word can be represented using subwords (from a fixed size vocabulary of subwords) $x_i = x_{i,0}...x_{i,M(x_i)}$ where $M(x_i)$ is number of subwords to which the word is divided. For a word language…
wswin
  • 21
  • 1
1
vote
1 answer

LDA and test data perplexity

I've performed Latent Dirichlet Analysis on a training set of documents. At the ideal number of topics I would expect a minimum of perplexity for the test dataset. However, I find that the perplexity for my test dataset increases with number of…
BHC
  • 141
  • 1
  • 3
1
2