Highest Voted 'latent-semantic-analysis' Questions - Statistical Analysis Stack Exchange

28

votes

3 answers

LSA vs. PCA (document clustering)

I'm investigation various techniques used in document clustering and I would like to clear some doubts concerning PCA (principal component analysis) and LSA (latent semantic analysis). First thing - what are the differences between them? I know that…

asked Jul 26 '13 at 21:56

user1315305

1,199
4
14
15

13

votes

4 answers

Fast alternatives to the EM algorithm

Are there any speedy alternatives to the EM algorithm for learning models with latent variables (especially pLSA)? I'm okay with sacrificing precision in favor of speed.

machine-learning optimization expectation-maximization latent-semantic-analysis

asked Apr 24 '12 at 07:37

Aslan986

728
2
7
18

10

votes

1 answer

A parellel between LSA and pLSA

In the original paper of pLSA the author, Thomas Hoffman, draw a parallel between pLSA and LSA data structures that I would like to discuss with you. Background: Taking inspiration the Information Retrieval suppose we have a collection of $N$…

machine-learning conditional-probability svd information-retrieval latent-semantic-analysis

asked Oct 19 '12 at 07:05

Aslan986

728
2
7
18

10

votes

3 answers

K-means on cosine similarities vs. Euclidean distance (LSA)

I am using latent semantic analysis to represent a corpus of documents in lower dimensional space. I want to cluster these documents into two groups using k-means. Several years ago, I did this using Python's gensim and writing my own k-means…

k-means svd latent-semantic-analysis cosine-distance cosine-similarity

asked Oct 16 '14 at 19:27

Jeff

3,525
5
27
38

9

votes

1 answer

When to choose PCA vs. LSA/LSI

Question: Are there any general guidelines with respect to the input data characteristics, that can be used to decide between applying PCA versus LSA/LSI? Brief summary of PCA vs. LSA/LSI: Principle Component Analysis (PCA) and Latent Semantic…

machine-learning pca latent-semantic-analysis

asked Jan 19 '12 at 02:05

qi5d02lx

221
2
4

8

votes

2 answers

Is it ok to get negative Cosine Similarity using LSA?

I am getting negative cosine similarity value between two documents in Latent Semantic analysis. How should it be treated?

machine-learning computational-statistics information-retrieval latent-semantic-analysis

asked Apr 10 '15 at 10:22

Jeet Arora

81
1
2

6

votes

2 answers

Deriving mathematical model of pLSA

After knowing how LSA works, I went on continue reading on pLSA but couldn't really make sense of the mathematical formula. This is what I get from wikipedia (other academic papers/tutorial show similar form) \begin{align} P(w,d) & = \sum_{c} P(c)…

machine-learning probability bayesian multilevel-analysis latent-semantic-analysis

asked Mar 28 '11 at 09:21

Jeffrey04

187
9

6

votes

1 answer

How to cluster LDA/LSI topics generated by gensim?

I'm an enthusiastic single developer working on a small start-up idea. I reduced a corpus of mine to an LSA/LDA vector space using gensim. Now I have a bunch of topics hanging around and I am not sure how to cluster the corpus documents. I see that…

python k-means natural-language latent-semantic-analysis

asked May 22 '12 at 00:43

osiloke

61
1
2

5

votes

2 answers

Latent Dirichlet Allocation vs. pLSA

In the original LDA paper it is stated that: The parameters for a k-topic pLSI model are k multinomial distributions of size V and M mixtures over the k hidden topics. This gives kV +kM parameters and therefore linear growth in M. The linear…

overfitting latent-variable dirichlet-distribution latent-semantic-analysis latent-semantic-indexing

asked Jun 07 '15 at 11:34

Shayan

231
3
8

5

votes

2 answers

What is a "tempered EM algorithm"?

In the paper of Probabilistic Latent Semantic Analysis by Hofmann, the author fits the model for document $\times$ word matrix through EM Algorithm in section 3. I was able to follow the derivation and meaning of the model derived in it. However in…

expectation-maximization latent-semantic-analysis

asked May 03 '11 at 09:12

Learner

4,007
11
37
39

4

votes

1 answer

Computing document similarity in latent semantic analysis

I have a question regarding Latent Semantic Analysis - after performing SVD decomposition of term-document matrix and choosing some number of dimensions, I get the set of new document vectors. Now, how can I calculate similarity between two…

clustering data-mining latent-semantic-analysis

asked Aug 23 '13 at 18:49

user1315305

1,199
4
14
15

4

votes

0 answers

Finding similarity between a reference and few working documents

I have to find the similarity between a reference document and a set of documents in a repository . Here is my method : 1. I find the term document matrix for all the documents including the reference document. 2. The svd is calculated for this…

text-mining latent-semantic-analysis

asked Jan 27 '12 at 02:53

siddharth

71
1
2

4

votes

1 answer

pLSA - Probabilistic Latent Semantic Analysis, how to choose topic number?

I am learning about pLSA (Probabilistic Latent Semantic Analysis) right now, in the hopes of being able to apply it to biomolecular annotation prediction. I have a very simple question: How do you choose the number of topics / classes to use in the…

machine-learning probability latent-semantic-analysis

asked Jan 07 '12 at 14:52

DavideChicco.it

682
1
10
24

4

votes

1 answer

Latent Semantic Indexing and Data Centering

In PCA it's common to center the data, i.e. preprocess the data matrix such that the columns have zero mean. PCA can be done via SVD, but in this case the data matrix also has to be mean-centered. If we don't center it, the found principal…

pca svd non-negative-matrix-factorization latent-semantic-analysis latent-semantic-indexing

asked May 18 '15 at 18:03

Alexey Grigorev

8,147
3
26
39

3

votes

0 answers

Application of LSA/LSI; Is it common to include the use of an edit distance?

I have been using Latent Semantic Analysis (LSA) or Latent Semantic Indexing (LSI) to identify whether different email addresses belong to the same individual by matching on names used for each email address; An email address represents a…

latent-semantic-analysis

asked Aug 27 '13 at 14:27

ErikKou

31
1

Questions tagged [latent-semantic-analysis]