Meaning of SVD plot of $U$ and $V^T$

Question

I am using SVD/PCA for text mining purposes.

Having a $(|terms|,|documents|)$ normalized matrix $M$, by applying SVD, I should be able to reduce the dimensionality and just keep the most meaningful dimensions.

By truncating the SVD to 2 components, $U_2$ and $V^T_2$ should contain the 2-dimensional spatial representation of terms and documents. This should tell me which terms are closer to which documents:

I've seen several examples where only $U$ is visualized, so I'm not sure that my idea of plotting documents is correct. This said, I've also seen that most of PCA implementation return $U\cdot\Sigma$, so this makes me wonder:

Is this idea correct?
Should I perform a dot product with $\Sigma$ on $U$ and/or $V^T$?
Why are some documents so distant from the words, since they surely contain at least one of them?

I take it that by $\Sigma$ you mean singular values. It looks to me that your question is about ways to do **biplot**. On the svd-based biplot, you can (and may) show only $V$, only $U$, or both. And with various normalizations. All these are valid but they convey different nuances of information. — ttnphns, Aug 13 '15 at 13:25
_If_ you are new to biplots it might be hard at first to capture the theme. I would then recommend you to study Q/A on this site tagged `biplot`. Among what you can find, my own, sufficiently detailed and dense answer with pictures is [here](http://stats.stackexchange.com/q/141754/3277) (start with pictures, to become involved). — ttnphns, Aug 13 '15 at 13:26

Meaning of SVD plot of $U$ and $V^T$

0 Answers0