Questions tagged [latent-dirichlet-alloc]

Latent Dirichlet Allocation (LDA) is an unsupervised, statistical approach to document modeling that discovers latent semantic topics in large collections of text. (Do NOT use this for Linear Discriminant Analysis.)

136 questions
30
votes
4 answers

R packages for performing topic modeling / LDA: just `topicmodels` and `lda`

It seems to me that only two R packages are able to perform Latent Dirichlet Allocation: One is lda, authored by Jonathan Chang; and the other is topicmodels authored by Bettina Grün and Kurt Hornik. What are the differences between these two…
bit-question
  • 2,637
  • 6
  • 25
  • 26
11
votes
0 answers

Is sparsity of topics a necessary condition for latent Dirichlet allocation (LDA) to work

I have been playing with the hyper-parameters of the latent Dirichlet allocation (LDA) model and am wondering how sparsity of topic priors play a role in inference. I have not performed these experiments on real data, but on simulated data. I…
kedarps
  • 2,902
  • 2
  • 19
  • 30
10
votes
1 answer

How does topic coherence score in LDA intuitively makes sense ?

referring to: http://qpleple.com/topic-coherence-to-evaluate-topic-models/ In order to decide the optimum number of topics to be extracted using LDA, topic coherence score is always used to measure how well the topics are extracted: $CoherenceScore…
Kid_Learning_C
  • 247
  • 1
  • 2
  • 7
8
votes
1 answer

Using topic words generated by LDA to represent a document

I want to do document classification by representing each document as a set of features. I know that there are many ways: BOW, TFIDF, ... I want to use Latent Dirichlet Allocation (LDA) to extract the topic keywords of EACH SINGLE document. the…
7
votes
1 answer

Reasonable hyperparameter range for Latent Dirichlet Allocation?

What are good ranges for the hyperparameters $\alpha$ and $\beta$ (explained well here) in LDA? I appreciate hyperparameter tuning always depends on the use case, data, content of documents etc., but is there any general rule or heuristic to choose…
PyRsquared
  • 1,084
  • 2
  • 9
  • 20
7
votes
1 answer

How does LDA (Latent Dirichlet Allocation) assign a topic-distribution to a new document?

I am new to topic modeling and read about LDA and NMF (Non-negative Matrix Factorization). I understand the training process work. Let's say I have 100 documents and I want to train an LDA for these documents with 10 topics. However, I don't really…
7
votes
1 answer

Clustering with Latent dirichlet allocation (LDA): Distance Measure

Since a similarity/distance measure is crucial for every clustering algorithm, I wonder what this measure is for LDA. Since LDA works on text as a bag-of-word model, can someone imagine the similarity between topics (clusters) are the representative…
Lisa
  • 73
  • 1
  • 1
  • 4
7
votes
0 answers

How to use LDA to predict topic proportion for new document?

I'm interested to learn how I can use a trained LDA (Latent Dirichlet Allocation) model to make predictions on the topic proportion of a new, unseen document using Naive Bayes. Let $z \in \{1, 2, ..., Z\}$ denote a particular topic (there's $Z$…
6
votes
1 answer

What's the relation between Matrix Factorization (MF) and Latent Dirichlet Allocation (LDA)?

My understanding is that both MF and LDA can be applied to do document classification. I will first summarize my understand about these two methods before I ask my questions. Assuming we use a big matrix X to summarize the documents in a corpus and…
5
votes
1 answer

How to use LDA to classify documents into pre defined topics

LDA is unsupervised and it classifies documents into topics. But, is there a way to make the LDA classify the documents into the predefined (or specific desired) topics. Below link says we need custom beta prior where we provide more weights to some…
tjt
  • 687
  • 4
  • 13
5
votes
3 answers

What is the labels for SVM classification when we firstly run LDA (lda->SVM)

I am using LDA (Latent Dirichlet Allocation) to extract topics. I want to do topic modelling and use the topics as features to do document classification. the reason for doing classification is to evaluate my LDA model. the same as this link lda ,…
sariii
  • 228
  • 1
  • 12
4
votes
1 answer

Understanding Latent Dirichlet Allocation Inference

I'm reading the wikipedia page about how Latent Dirichlet Allocation assigns a topic distribution to a document after the model's been learnt (see this link). I'm very confused by this part of it: Let $n_{j,r}^i$ be the number of word tokens in…
Andrew
  • 203
  • 2
  • 8
4
votes
0 answers

LDA implementaion in pymc3

I am implementing LDA with pymc3 using the referred code for pymc from the post Latent Dirichlet Allocation in PyMC I am trying to use it for pymc3 bt having problems defining w import numpy as np import pymc3 as pm, theano, theano.tensor as t K…
4
votes
1 answer

LDA vs. labeled LDA

I have gone through the techniques and understood the basic ideas. But I want to know which one usually is expected to work better, LDA or Labeled LDA? What are the features of the dataset that help decide amongst the two?
4
votes
1 answer

Inferring the number of topics for gensim's LDA - perplexity, CM, AIC, and BIC

I am confused as to how to interpret the LDA's perplexity fluctuations with different numbers of topics, in the endeavour of determining the best number of topics. Additionally, I would like to know how to implement AIC/BIC with gensim LDA models. I…
1
2 3
9 10