0

I had a data matrix 609 rows × 264 columns, time-series data. Data was reduced using t-SNE algorithm to 3 dimensions. When being clustered I get zero clusters, where all data points are considered noise. I tried increasing eps until I reached 1.0, and I get same results for min_samples=2: zero clusters, all noise data.

As a side note, I ran DBSCAN on t-SNE-reduced 2D data and I got (not so good but) more decent results.

My question is how can the eps be 1.0, and I get no clusters for min_samples=2?

yousraHazem
  • 101
  • 2
  • 1
    DBSCAN, although conceptually great, is very sensitive to parameters. I recommend to define a param grid such as eps = [0.01, 0.1, 1, 10] and min_samples = [1,2,4,8,16,32,64] and plot the results for all possible combinations. – Nikolas Rieble Jan 23 '18 at 12:32
  • Also i recommend to transform your time-series into a feature-space first (mean, max, min, var, kurt, etc. ) and then cluster in feature-space – Nikolas Rieble Jan 23 '18 at 12:34
  • 1
    The `eps` parameter in DBSCAN can go above 1. – Stephan Kolassa Jan 24 '18 at 07:51

1 Answers1

3

Epsilon is a distance. If the distances between your objects are much larger than 1, you will need to choose larger values. For example, when clustering tweet locations, you may want epsilon to be larger than 100 (meters).

In the DBSCAN papers, the authors propose ways of estimating epsilon from the $k$-nearest neighbors graph.

There also exist later versions of the algorithm (such as OPTICS) that do not need epsilon at all.

t-SNE for clustering can lead to misleading results. There is a very good post on this here on this site. It appears that t-SNE has a tendency to rip apart even very clean clusters, and on the other hand attach noise to nearby clusters.

Peiffap
  • 221
  • 1
  • 7
Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96