Latent Dirichlet Allocation (LDA) is an unsupervised, statistical approach to document modeling that discovers latent semantic topics in large collections of text. (Do NOT use this for Linear Discriminant Analysis.)
Questions tagged [latent-dirichlet-alloc]
136 questions
30
votes
4 answers
R packages for performing topic modeling / LDA: just `topicmodels` and `lda`
It seems to me that only two R packages are able to perform Latent Dirichlet Allocation:
One is lda, authored by Jonathan Chang; and the other is topicmodels authored by Bettina Grün and Kurt Hornik.
What are the differences between these two…

bit-question
- 2,637
- 6
- 25
- 26
11
votes
0 answers
Is sparsity of topics a necessary condition for latent Dirichlet allocation (LDA) to work
I have been playing with the hyper-parameters of the latent Dirichlet allocation (LDA) model and am wondering how sparsity of topic priors play a role in inference.
I have not performed these experiments on real data, but on simulated data. I…

kedarps
- 2,902
- 2
- 19
- 30
10
votes
1 answer
How does topic coherence score in LDA intuitively makes sense ?
referring to: http://qpleple.com/topic-coherence-to-evaluate-topic-models/
In order to decide the optimum number of topics to be extracted using LDA, topic coherence score is always used to measure how well the topics are extracted:
$CoherenceScore…

Kid_Learning_C
- 247
- 1
- 2
- 7
8
votes
1 answer
Using topic words generated by LDA to represent a document
I want to do document classification by representing each document as a set of features. I know that there are many ways: BOW, TFIDF, ...
I want to use Latent Dirichlet Allocation (LDA) to extract the topic keywords of EACH SINGLE document. the…

Munichong
- 1,645
- 3
- 15
- 26
7
votes
1 answer
Reasonable hyperparameter range for Latent Dirichlet Allocation?
What are good ranges for the hyperparameters $\alpha$ and $\beta$ (explained well here) in LDA?
I appreciate hyperparameter tuning always depends on the use case, data, content of documents etc., but is there any general rule or heuristic to choose…

PyRsquared
- 1,084
- 2
- 9
- 20
7
votes
1 answer
How does LDA (Latent Dirichlet Allocation) assign a topic-distribution to a new document?
I am new to topic modeling and read about LDA and NMF (Non-negative Matrix Factorization). I understand the training process work. Let's say I have 100 documents and I want to train an LDA for these documents with 10 topics. However, I don't really…

nickg
- 71
- 1
- 3
7
votes
1 answer
Clustering with Latent dirichlet allocation (LDA): Distance Measure
Since a similarity/distance measure is crucial for every clustering algorithm, I wonder what this measure is for LDA.
Since LDA works on text as a bag-of-word model, can someone imagine the similarity between topics (clusters) are the representative…

Lisa
- 73
- 1
- 1
- 4
7
votes
0 answers
How to use LDA to predict topic proportion for new document?
I'm interested to learn how I can use a trained LDA (Latent Dirichlet Allocation) model to make predictions on the topic proportion of a new, unseen document using Naive Bayes.
Let $z \in \{1, 2, ..., Z\}$ denote a particular topic (there's $Z$…

zzhengnan
- 171
- 3
6
votes
1 answer
What's the relation between Matrix Factorization (MF) and Latent Dirichlet Allocation (LDA)?
My understanding is that both MF and LDA can be applied to do document classification. I will first summarize my understand about these two methods before I ask my questions.
Assuming we use a big matrix X to summarize the documents in a corpus and…

cwl
- 719
- 3
- 19
5
votes
1 answer
How to use LDA to classify documents into pre defined topics
LDA is unsupervised and it classifies documents into topics. But, is there a way to make the LDA classify the documents into the predefined (or specific desired) topics.
Below link says we need custom beta prior where we provide more weights to some…

tjt
- 687
- 4
- 13
5
votes
3 answers
What is the labels for SVM classification when we firstly run LDA (lda->SVM)
I am using LDA (Latent Dirichlet Allocation) to extract topics. I want to do topic modelling and use the topics as features to do document classification. the reason for doing classification is to evaluate my LDA model.
the same as this link
lda ,…

sariii
- 228
- 1
- 12
4
votes
1 answer
Understanding Latent Dirichlet Allocation Inference
I'm reading the wikipedia page about how Latent Dirichlet Allocation assigns a topic distribution to a document after the model's been learnt (see this link). I'm very confused by this part of it:
Let $n_{j,r}^i$ be the number of word tokens in…

Andrew
- 203
- 2
- 8
4
votes
0 answers
LDA implementaion in pymc3
I am implementing LDA with pymc3 using the referred code for pymc from the post
Latent Dirichlet Allocation in PyMC
I am trying to use it for pymc3 bt having problems defining
w
import numpy as np
import pymc3 as pm, theano, theano.tensor as t
K…

Anil Gaddam
- 41
- 3
4
votes
1 answer
LDA vs. labeled LDA
I have gone through the techniques and understood the basic ideas. But I want to know which one usually is expected to work better, LDA or Labeled LDA? What are the features of the dataset that help decide amongst the two?

Rohit Jain
- 143
- 4
4
votes
1 answer
Inferring the number of topics for gensim's LDA - perplexity, CM, AIC, and BIC
I am confused as to how to interpret the LDA's perplexity fluctuations with different numbers of topics, in the endeavour of determining the best number of topics. Additionally, I would like to know how to implement AIC/BIC with gensim LDA models.
I…

Jabro
- 361
- 2
- 12