2

During my bachelor thesis I gathered a bunch of comments, labeled with a 0 for containing no hate and 1 for containing hate. The labels where given by volunteers. The around 2500 comments are of various lengths (between 100 and 1800 characters).

Now after my thesis I stepped over visualization techniques like PCA and t-SNE. Applied to the MNIST dataset of handwritten digits these techniques show amazing results.

As I understand it, a comment consists of words and somehow it is high dimensional data as the images in MNIST are. Because of that: Is it possible to visualize the comments with a technique as PCA or t-SNE?

I don't know how I could convert the data or where I can find a tutorial which applies text to such a technique. Thanks for your thoughts!

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
So S
  • 523
  • 5
  • 9
  • 1
    See http://karpathy.github.io/2014/07/02/visualizing-top-tweeps-with-t-sne-in-Javascript/ – amoeba May 11 '17 at 13:31

0 Answers0