Questions tagged [topic-models]

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source:wikipedia)

Software for topic modelling include

Mallet (Java)
Stanford Topic Modeling Toolbox (Scala)
gensim (Python)
lda (R)

238 questions

votes

4 answers

R packages for performing topic modeling / LDA: just `topicmodels` and `lda`

It seems to me that only two R packages are able to perform Latent Dirichlet Allocation: One is lda, authored by Jonathan Chang; and the other is topicmodels authored by Bettina Grün and Kurt Hornik. What are the differences between these two…

asked Mar 10 '12 at 15:47

bit-question

2,637
6
25
26

votes

3 answers

Topic models and word co-occurrence methods

Popular topic models like LDA usually cluster words that tend to co-occur together into the same topic (cluster). What is the main difference between such topic models, and other simple co-occurrence based clustering approaches like PMI ? (PMI…

machine-learning text-mining natural-language topic-models

asked Jul 15 '12 at 02:37

kanzen_master

1,235
3
15
22

votes

2 answers

Natural interpretation for LDA hyperparameters

Can somebody explain what is the natural interpretation for LDA hyperparameters? ALPHA and BETA are parameters of Dirichlet distributions for (per document) topic and (per topic) word distributions respectively. However can someone explain what it…

interpretation prior topic-models hyperparameter

asked Sep 17 '12 at 01:17

abhinavkulkarni

votes

2 answers

Topic stability in topic models

I am working on a project where I want to extract some information about the content of a series of open-ended essays. In this particular project, 148 people wrote essays about a hypothetical student organization as part of a larger experiment. …

machine-learning model-selection small-sample topic-models dirichlet-process

asked Jul 01 '13 at 15:07

Patrick S. Forscher

3,122
23
43

votes

2 answers

How to calculate perplexity of a holdout with Latent Dirichlet Allocation?

I'm confused about how to calculate the perplexity of a holdout sample when doing Latent Dirichlet Allocation (LDA). The papers on the topic breeze over it, making me think I'm missing something obvious... Perplexity is seen as a good measure of…

text-mining topic-models

asked Nov 10 '11 at 03:08

drevicko

votes

1 answer

Topic prediction using latent Dirichlet allocation

I have used LDA on a corpus of documents and found some topics. The output of my code is two matrices containing probabilities; one doc-topic probabilities and the other word-topic probabilities. But I actually don't know how to use these results to…

text-mining topic-models

asked Apr 07 '11 at 14:42

Hossein

votes

3 answers

Topic models for short documents

Inspired by this question, I'm wondering whether any work has been done on topic models for large collections of extremely short texts. My intuition is that Twitter should be a natural inspiration for such models. However, from some limited…

references text-mining topic-models natural-language

asked Mar 30 '12 at 17:28

Martin O'Leary

3,697
21
27

votes

0 answers

Is sparsity of topics a necessary condition for latent Dirichlet allocation (LDA) to work

I have been playing with the hyper-parameters of the latent Dirichlet allocation (LDA) model and am wondering how sparsity of topic priors play a role in inference. I have not performed these experiments on real data, but on simulated data. I…

dirichlet-distribution topic-models latent-dirichlet-alloc

asked Mar 07 '17 at 21:14

kedarps

2,902
2
19
30

votes

1 answer

How does topic coherence score in LDA intuitively makes sense ?

referring to: http://qpleple.com/topic-coherence-to-evaluate-topic-models/ In order to decide the optimum number of topics to be extracted using LDA, topic coherence score is always used to measure how well the topics are extracted: $CoherenceScore…

topic-models latent-dirichlet-alloc

asked Nov 02 '18 at 20:58

Kid_Learning_C

votes

1 answer

When to use LDA over GMM for clustering?

I have a dataset containing user activity with 168 dimensions, where I want to extract clusters using unsupervised learning. It is not obvious to me whether to use a topic modelling approach in Latent Dirichlet allocation (LDA) or Gaussian Mixture…

clustering gaussian-mixture-distribution unsupervised-learning topic-models

asked Sep 24 '15 at 13:55

pir

4,626
10
38
73

votes

2 answers

Supervised approaches vs. topic models in sentiment analysis

I am researching Sentiment Analysis over social media, particularly classifying online texts such as blog posts as positive, negative or neutral. Most of the approaches I have found for sentiment analysis are supervised (they need labeled data to…

machine-learning unsupervised-learning topic-models sentiment-analysis

asked Jul 17 '12 at 07:47

kanzen_master

1,235
3
15
22

votes

1 answer

Using topic words generated by LDA to represent a document

I want to do document classification by representing each document as a set of features. I know that there are many ways: BOW, TFIDF, ... I want to use Latent Dirichlet Allocation (LDA) to extract the topic keywords of EACH SINGLE document. the…

feature-selection text-mining topic-models latent-dirichlet-alloc

asked Sep 06 '14 at 15:36

Munichong

1,645
3
15
26

votes

1 answer

Variational inference for nested Chinese restaurant process

I recently read paper by Chong Wang and David M. Blei "Variational Inference for the Nested Chinese Restaurant Process". And I couldn't understand the next part (from p.5): The variational update functions for W and x depend on the actual…

machine-learning data-mining topic-models

asked Dec 12 '12 at 15:50

peppered

votes

1 answer

How does LDA (Latent Dirichlet Allocation) assign a topic-distribution to a new document?

I am new to topic modeling and read about LDA and NMF (Non-negative Matrix Factorization). I understand the training process work. Let's say I have 100 documents and I want to train an LDA for these documents with 10 topics. However, I don't really…

natural-language topic-models latent-dirichlet-alloc non-negative-matrix-factorization

asked Jan 29 '18 at 13:07

nickg

votes

0 answers

How to use LDA to predict topic proportion for new document?

I'm interested to learn how I can use a trained LDA (Latent Dirichlet Allocation) model to make predictions on the topic proportion of a new, unseen document using Naive Bayes. Let $z \in \{1, 2, ..., Z\}$ denote a particular topic (there's $Z$…

bayesian sampling naive-bayes topic-models latent-dirichlet-alloc

asked May 24 '17 at 04:09

zzhengnan

2 3

…

15 16 Next