I am using SVD/PCA for text mining purposes.
Having a $(|terms|,|documents|)$ normalized matrix $M$, by applying SVD, I should be able to reduce the dimensionality and just keep the most meaningful dimensions.
By truncating the SVD to 2 components, $U_2$ and $V^T_2$ should contain the 2-dimensional spatial representation of terms and documents. This should tell me which terms are closer to which documents:
I've seen several examples where only $U$ is visualized, so I'm not sure that my idea of plotting documents is correct. This said, I've also seen that most of PCA implementation return $U\cdot\Sigma$, so this makes me wonder:
- Is this idea correct?
- Should I perform a dot product with $\Sigma$ on $U$ and/or $V^T$?
- Why are some documents so distant from the words, since they surely contain at least one of them?