Does it make sense to run DBSCAN on the output from t-SNE?

Question

Performed time series clustering where I used DTW to generate a distance matrix. The distance matrix was then given as an input to t-SNE where the two-dimensional results from t-SNE were used for clustering with DBSCAN.

Does it make sense?

score 3 · Answer 1 · answered Jun 15 '18 at 01:14

3

T-SNE is a manifold technique and as such does not preserve distances; therefore it is not recommended to run distance-based (e.g. k-means) or density-based (e.g. DBSCAN) clustering algorithms on the output of T-SNE. This has been asked before.

If you want a dimensional reduction algorithm that does preserve distances, you can use PCA instead of T-SNE. PCA gives you an orthogonal rotation of your original data; one of the properties on an orthogonal transformation is that it preserves distances. When you use PCA for dimensional reduction by projecting into a lower dimensional space by throwing out factors with small eigenvectors, you lose only a small amount of information about distance.

answered Jun 15 '18 at 01:14

olooney

2,747
11
23

On time series with differing lengths one cannot use PCA. However, MDS (Multidimensional scaling) will construct a lower dimensional embedding that preserves distances, and that is a good starting point here. – Has QUIT--Anony-Mousse Jun 15 '18 at 07:22
@Anony-Mousse I'd be very interested to hear your opinion about the answer I just posted in the duplicate thread: https://stats.stackexchange.com/a/352138/28666. – amoeba Jun 19 '18 at 14:32
I do seem to get distinct clusters from using dbscan on data from tsne. After I did a row-wise normalization of data I seem to get time series clustered on outliers in the time series. – LazyNearestNeigbour Sep 05 '18 at 16:18

Does it make sense to run DBSCAN on the output from t-SNE?

1 Answers1