I'm using latent semantic indexing to find similarities between documents (thanks, JMS!)
After dimension reduction, I've tried k-means clustering to group the documents into clusters, which works very well. But I'd like to go a bit further, and visualize the documents as a set of nodes, where the distance between any two nodes is inversely proportional to their similarity (nodes that are highly similar are close together).
It strikes me that I can't accurately reduce a similarity matrix to a 2-dimensional graph since my data is > 2 dimensions. So my first question: is there a standard way to do this?
Could I just reduce my data to two dimensions and then plot them as the X and Y axis, and would that suffice for a group of ~100-200 documents? If this is the solution, is it better to reduce my data to 2 dimensions from the start, or is there any way to pick the two "best" dimensions from my multi-dimensional data?
I am using Python and the gensim library if that makes a difference.