Questions tagged [topic-models]

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source:wikipedia)

Software for topic modelling include

238 questions
30
votes
4 answers

R packages for performing topic modeling / LDA: just `topicmodels` and `lda`

It seems to me that only two R packages are able to perform Latent Dirichlet Allocation: One is lda, authored by Jonathan Chang; and the other is topicmodels authored by Bettina Grün and Kurt Hornik. What are the differences between these two…
bit-question
  • 2,637
  • 6
  • 25
  • 26
26
votes
3 answers

Topic models and word co-occurrence methods

Popular topic models like LDA usually cluster words that tend to co-occur together into the same topic (cluster). What is the main difference between such topic models, and other simple co-occurrence based clustering approaches like PMI ? (PMI…
24
votes
2 answers

Natural interpretation for LDA hyperparameters

Can somebody explain what is the natural interpretation for LDA hyperparameters? ALPHA and BETA are parameters of Dirichlet distributions for (per document) topic and (per topic) word distributions respectively. However can someone explain what it…
abhinavkulkarni
  • 778
  • 1
  • 6
  • 15
23
votes
2 answers

Topic stability in topic models

I am working on a project where I want to extract some information about the content of a series of open-ended essays. In this particular project, 148 people wrote essays about a hypothetical student organization as part of a larger experiment. …
21
votes
2 answers

How to calculate perplexity of a holdout with Latent Dirichlet Allocation?

I'm confused about how to calculate the perplexity of a holdout sample when doing Latent Dirichlet Allocation (LDA). The papers on the topic breeze over it, making me think I'm missing something obvious... Perplexity is seen as a good measure of…
drevicko
  • 394
  • 1
  • 3
  • 11
18
votes
1 answer

Topic prediction using latent Dirichlet allocation

I have used LDA on a corpus of documents and found some topics. The output of my code is two matrices containing probabilities; one doc-topic probabilities and the other word-topic probabilities. But I actually don't know how to use these results to…
Hossein
  • 233
  • 1
  • 2
  • 7
15
votes
3 answers

Topic models for short documents

Inspired by this question, I'm wondering whether any work has been done on topic models for large collections of extremely short texts. My intuition is that Twitter should be a natural inspiration for such models. However, from some limited…
Martin O'Leary
  • 3,697
  • 21
  • 27
11
votes
0 answers

Is sparsity of topics a necessary condition for latent Dirichlet allocation (LDA) to work

I have been playing with the hyper-parameters of the latent Dirichlet allocation (LDA) model and am wondering how sparsity of topic priors play a role in inference. I have not performed these experiments on real data, but on simulated data. I…
kedarps
  • 2,902
  • 2
  • 19
  • 30
10
votes
1 answer

How does topic coherence score in LDA intuitively makes sense ?

referring to: http://qpleple.com/topic-coherence-to-evaluate-topic-models/ In order to decide the optimum number of topics to be extracted using LDA, topic coherence score is always used to measure how well the topics are extracted: $CoherenceScore…
Kid_Learning_C
  • 247
  • 1
  • 2
  • 7
10
votes
1 answer

When to use LDA over GMM for clustering?

I have a dataset containing user activity with 168 dimensions, where I want to extract clusters using unsupervised learning. It is not obvious to me whether to use a topic modelling approach in Latent Dirichlet allocation (LDA) or Gaussian Mixture…
8
votes
2 answers

Supervised approaches vs. topic models in sentiment analysis

I am researching Sentiment Analysis over social media, particularly classifying online texts such as blog posts as positive, negative or neutral. Most of the approaches I have found for sentiment analysis are supervised (they need labeled data to…
8
votes
1 answer

Using topic words generated by LDA to represent a document

I want to do document classification by representing each document as a set of features. I know that there are many ways: BOW, TFIDF, ... I want to use Latent Dirichlet Allocation (LDA) to extract the topic keywords of EACH SINGLE document. the…
7
votes
1 answer

Variational inference for nested Chinese restaurant process

I recently read paper by Chong Wang and David M. Blei "Variational Inference for the Nested Chinese Restaurant Process". And I couldn't understand the next part (from p.5): The variational update functions for W and x depend on the actual…
peppered
  • 203
  • 1
  • 4
7
votes
1 answer

How does LDA (Latent Dirichlet Allocation) assign a topic-distribution to a new document?

I am new to topic modeling and read about LDA and NMF (Non-negative Matrix Factorization). I understand the training process work. Let's say I have 100 documents and I want to train an LDA for these documents with 10 topics. However, I don't really…
7
votes
0 answers

How to use LDA to predict topic proportion for new document?

I'm interested to learn how I can use a trained LDA (Latent Dirichlet Allocation) model to make predictions on the topic proportion of a new, unseen document using Naive Bayes. Let $z \in \{1, 2, ..., Z\}$ denote a particular topic (there's $Z$…
1
2 3
15 16