I have to find the similarity between a reference document and a set of documents in a repository .
Here is my method :
1. I find the term document matrix for all the documents including the reference document.
2. The svd is calculated for this matrix.
3. I take the v array (the third result).
4. I transpose this matrix so that the each row represents a document.
5. The first row represents the reference document.
6. I find the cosine similarity between this row and the rest of the rows.
My doubts:
Since I have around 7 documents in my database, I get only a 8x8
v
array (document matrix). So will I get a correct result if I find the cosine similarity with these 8 values alone?Is such a method adopted generally?