Questions tagged [latent-semantic-indexing]
17 questions
9
votes
1 answer
Understanding Singular Value Decomposition in the context of LSI
My question is generally on Singular Value Decomposition (SVD), and particularly on Latent Semantic Indexing (LSI).
Say, I have $ A_{word \times document} $ that contains frequencies of 5 words for 7 documents.
A = matrix(data=c(2,0,8,6,0,3,1,
…

Zhubarb
- 7,753
- 2
- 28
- 44
5
votes
2 answers
Latent Dirichlet Allocation vs. pLSA
In the original LDA paper it is stated that:
The parameters for a k-topic pLSI model are k multinomial distributions of size V and M mixtures over the k hidden topics. This gives kV +kM parameters and therefore linear growth in M. The linear…

Shayan
- 231
- 3
- 8
4
votes
1 answer
Latent Semantic Indexing and Data Centering
In PCA it's common to center the data, i.e. preprocess the data matrix such that the columns have zero mean. PCA can be done via SVD, but in this case the data matrix also has to be mean-centered. If we don't center it, the found principal…

Alexey Grigorev
- 8,147
- 3
- 26
- 39
3
votes
0 answers
Difference between Latent and Explicit Semantic Analysis
I'm trying to analyse the paper ''Computing Semantic Relatedness
using Wikipedia-based Explicit Semantic
Analysis''.
One component of the system described therein that I'm currently grappling with is the difference between Latent and Explicit…

smatthewenglish
- 141
- 5
3
votes
1 answer
How are the clustering algorithms using the concept of Latent Semantic Analysis?
I have come across Latent Semantic Analysis, but I could not understand it.
Can Latent Semantic Analysis be used by humans in clustering of the data-sets? For convenience let us consider the datasets to be a two dimensional sets. Can the humans…

Ramseyl
- 51
- 1
- 5
2
votes
0 answers
What are some of the advantages and disadvantages of Explicit Semantic Analysis (ESA)?
I am writing a report semantic analysis and I have come across a celebrated paper Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis by Evgeniy Gabrilovich and Shaul Markovitch.
I have been looking at this paper and some…

silent_dev
- 557
- 1
- 6
- 16
2
votes
1 answer
How to Use LSA Create Topics?
Just want to know the general process of creating document topics via LSA. For creating document clusters, I know first I should get SVD dimensions and then use k-means clustering on these SVD dimensions to create document clusters. For creating…

kokoma
- 99
- 4
2
votes
0 answers
Supervised semantic analysis
Dimensional reduction and semantic vectorization techniques like LSA, pLSA, LDA and Random Indexing do not take advantage of semantic labeled data like Explicit Semantic Analysis (ESA). I am looking for state of art of supervised semantic analysis…

hernan
- 61
- 3
1
vote
0 answers
Semantic Analysis: Set a default value for examples not in scope of the training set
I am working with a semantic analysis problem and wanted to know if anyone has been able to set a default value, say a probability of zero or 0.5 for phrases/words that the machine learning algorithm has never seen. Using scikit-learn's classifiers…

MyopicVisage
- 133
- 6
1
vote
0 answers
Latent class in Gaussian mixture model
I would like to get any advice on the latent class in the mixture model.
But i wish to do latent code by hand without relying on the existing R package.
This is my snippet code to do the finite mixture:
no<-nrows(myData.obs)
prob1 =…

Jas
- 11
- 3
1
vote
0 answers
Is Latent Semantic Analysis a clustering algorithm?
The input of LSA is a term frequency matrix of a set of documents. What's the output? If I want to cluster a bunch of news into different clusters, can I use LSA? If not, what's the major uses of LSA?
Is it similar to K-means?

user697911
- 121
- 4
1
vote
0 answers
Calculating perplexity for LSA
I am new to topic modelling, so kindly bear me if my question is silly.
I am trying to calculate perplexity after applying LSA. i am aware that LSA returns negative values, so i followed the steps stated in coccaro to find the propability of each…

Hemaa mathavan
- 121
- 6
1
vote
0 answers
LDA or pLSA for short documents?
I'd like to classify short documents, from a predefined set of words.
What algorithm would you suggest, LDA or pLSA ?
My use case
I have a list of users, and for each user a list of the pages she likes.
My goal is to classify users (documents) into…

Uri Goren
- 1,701
- 1
- 10
- 24
1
vote
0 answers
Latent semantic analysis and keyword extraction
Well, I've started with a collection of documents. The aim is to extract keywords for each document.
I've made a document-term matrix, to which I applied an singular value decomposition. I've made a new matrix (an approximation of the Original…

Silke
- 285
- 2
- 13
0
votes
0 answers
How to intepret negative coefficients in LSI model?
I am using Gensim 4 to train a LSI model.
I tried to print some topic, and this is one of the result:
-0.246*"cancer" + -0.218*"patient" + -0.200*"risk" + -0.131*"ci" + -0.122*"associ" + -0.120*"breast" + -0.118*"women" + -0.114*"atlas" +…

robertspierre
- 1,358
- 6
- 21