3

I have data in the form of timestamp,lat,long which is gps data for users. I'm new to data mining and want to understand how can I start clustering these data to understand more about it.

Should I like build a matrix of one trajectory v/s the other based on some distance metric and then apply some clustering algorithm on it?

Data will be a trajectory for each user.

Each user will have a sequence of points in the form of (timestamp,lat,long) starting at point A to point B. I want to cluster the trajectories.

gizgok
  • 569
  • 5
  • 9
  • Can you tell us a little about the data? What are the variables that you have? Are they continuous, discrete, a mix? What is it that you're trying to achieve with clustering? Is it some form of association, or classification? Have you done any initial looking at the data? i.e. have you looked at the distributions of the variables, or the density of their clusters? – Eric Peterson Jun 05 '13 at 20:05
  • @ClarkW.Griswold Please see the edit – gizgok Jun 05 '13 at 20:33

1 Answers1

3

It's hard to give an answer, because the question is too broad.

Why don't you just try it? Data mining is an explorative job, you will have to try out a lot of stuff and see what works for you.

There are hundreds of clustering algorithms. Some indeed work on similarity matrixes. Others don't (e.g. k-means). Some just need any source of distance, and are actually better scalable when you don't build a matrix (which has size $O(n^2)$, putting a limit on your scalability), such as DBSCAN and OPTICS.

And then of course there are different ways of measuring the similarity of trajectories, too.

So you better start trying out things, and see what works for you, and what is in your data.

Erich Schubert
  • 2,729
  • 1
  • 14
  • 22
  • Hi Eric - would be keen to have some feedback on this question if you have time https://stats.stackexchange.com/questions/368100/trajectory-clustering-preprocessing-and-algorithms – Xavier Bourret Sicotte Sep 22 '18 at 00:24