T-distributed stochastic neighbor embedding (t-SNE) is a nonlinear dimensionality reduction algorithm introduced by van der Maaten and Hinton in 2008.
Questions tagged [tsne]
135 questions
125
votes
7 answers
Clustering on the output of t-SNE
I've got an application where it'd be handy to cluster a noisy dataset before looking for subgroup effects within the clusters. I first looked at PCA, but it takes ~30 components to get to 90% of the variability, so clustering on just a couple of…

generic_user
- 11,981
- 8
- 40
- 63
64
votes
4 answers
Are there cases where PCA is more suitable than t-SNE?
I want to see how 7 measures of text correction behaviour (time spent correcting the text, number of keystrokes, etc.) relate to each other. The measures are correlated. I ran a PCA to see how the measures projected onto PC1 and PC2, which avoided…

user3744206
- 807
- 1
- 8
- 10
57
votes
1 answer
Why do we use Kullback-Leibler divergence rather than cross entropy in the t-SNE objective function?
In my mind, KL divergence from sample distribution to true distribution is simply the difference between cross entropy and entropy.
Why do we use cross entropy to be the cost function in many machine learning models, but use Kullback-Leibler…

JimSpark
- 673
- 1
- 6
- 5
41
votes
3 answers
Why is t-SNE not used as a dimensionality reduction technique for clustering or classification?
In a recent assignment, we were told to use PCA on the MNIST digits to reduce the dimensions from 64 (8 x 8 images) to 2. We then had to cluster the digits using a Gaussian Mixture Model. PCA using only 2 principal components does not yield distinct…

willk
- 583
- 1
- 7
- 12
37
votes
2 answers
When is t-SNE misleading?
Quoting from one of the authors:
t-Distributed Stochastic Neighbor Embedding (t-SNE) is a (prize-winning) technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets.
So it sounds…

Lyndon White
- 2,744
- 1
- 19
- 35
28
votes
2 answers
Intuitive explanation of how UMAP works, compared to t-SNE
I have a PhD in molecular biology. My studies recently started to involve high dimensional data analysis. I got the idea of how t-SNE works (thanks to a StatQuest video on YouTube) but can't seem to wrap my mind around UMAP (I listened to the UMAP…

Atakan
- 591
- 1
- 4
- 14
27
votes
4 answers
What's wrong with t-SNE vs PCA for dimensional reduction using R?
I have a matrix of 336x256 floating point numbers (336 bacterial genomes (columns) x 256 normalized tetranucleotide frequencies (rows), e.g. every column adds up to 1).
I get nice results when I run my analysis using principle component analysis.…

Loddi
- 271
- 1
- 3
- 3
26
votes
1 answer
t-SNE versus MDS
Been reading some questions about t-SNE (t-Distributed Stochastic Neighbor Embedding) lately, and also visited some questions about MDS (Multidimensional Scaling).
They are often used analogously, so it seemed like a good idea make this question…

Firebug
- 15,262
- 5
- 60
- 127
25
votes
3 answers
Should dimensionality reduction for visualization be considered a "closed" problem, solved by t-SNE?
I've been reading a lot about $t$-sne algorithm for dimensionality reduction. I'm very impressed with the performance on "classic" datasets, like MNIST, where it achieves a clear separation of the digits (see original article):
I've also used it to…

galoosh33
- 2,202
- 13
- 20
23
votes
1 answer
Should data be centered+scaled before applying t-SNE?
Some of my data's features have large values, while other features have much smaller values.
Is it necessary to center+scale data before applying t-SNE to prevent bias towards the larger values?
I use Python's sklearn.manifold.TSNE implementation…

stmax
- 396
- 1
- 2
- 11
21
votes
5 answers
Are there any versions of t-SNE for streaming data?
My understanding of t-SNE and the Barnes-Hut approximation is that all data points are required so that all force interactions can be calculated at the same time and each point can be adjusted in the 2d (or lower dimensional) map.
Are there any…

Ger
- 513
- 4
- 11
17
votes
4 answers
Choosing the hyperparameters using T-SNE for classification
In as specific problem that I work with (a competition) I have the follwoing setting: 21 features (numerical on [0,1]) and a binary output. I have approx 100 K rows. The setting seems to be very noisy.
Me and other participants apply feature…

Richi W
- 3,216
- 3
- 30
- 53
15
votes
1 answer
What is the meaning of the axes in t-SNE?
I'm currently trying to wrap my head around the t-SNE math. Unfortunately, there is still one question I can't answer satisfactorily: What is the actual meaning of the axes in a t-SNE graph? If I were to give a presentation on this topic or include…

Hagbard
- 428
- 5
- 15
15
votes
4 answers
What are the differences between autoencoders and t-SNE?
As far as I know, both autoencoders and t-SNE are used for nonlinear dimensionality reduction. What are the differences between them and why should I use one versus another?

RockTheStar
- 11,277
- 31
- 63
- 89
14
votes
1 answer
What classification algorithm should one use after seeing that t-SNE separates classes well?
Let's assume we have a classification problem and at first we want to get some insight from the data and we do t-SNE. The result of t-SNE separates classes very well. This implies that it is possible to build classification model that will also…

Tomek Tarczynski
- 3,854
- 7
- 29
- 37