4

I have to find the similarity between a reference document and a set of documents in a repository .

Here is my method :

1. I find the term document matrix for all the documents including the reference document.
2. The svd is calculated for this matrix.
3. I take the v array (the third result).
4. I transpose this matrix so that the each row represents a document. 
5. The first row represents the reference document. 
6. I find the cosine similarity between this row and the rest of the rows. 

My doubts:

  1. Since I have around 7 documents in my database, I get only a 8x8 v array (document matrix). So will I get a correct result if I find the cosine similarity with these 8 values alone?

  2. Is such a method adopted generally?

chl
  • 50,972
  • 18
  • 205
  • 364
siddharth
  • 71
  • 1
  • 2

0 Answers0