I have a set of almost 1600 time series on 2 years which I want to group into clusters. Do you think this is possible using k-means? Which method do you advice me to use? Is this possible at all using SPSS?
-
1Related (but no SPSS solution): [Is it possible to do time-series clustering based on curve shape?](http://stats.stackexchange.com/q/3331/930). – chl Oct 10 '12 at 21:16
-
See [this article](http://e-learning.bahcesehir.edu.tr/coursecontent/CSE5155%20DATA%20MINING%20I/pdfs/Clustering%20of%20time%20series%20data%E2%80%94a%20survey.pdf) for situations when k-means is suitable and also other [similar questions](http://stats.stackexchange.com/questions/9475/time-series-clustering). – sitems Oct 10 '12 at 21:27
-
1IMHO I wouldn't use SPSS for that task. I have been using SPSS and Matlab for clustering (I'm a novice in clustering I have to say) but the difference in time and the flexibility is very noticeable between the 2 packages, being Matlab a better option even if it is also a high-level programming language. As far as I read R should be way faster, but I didn't try it yet. Just my 2 cents. – Diego Oct 10 '12 at 23:48
2 Answers
k-means cannot use arbitrary distance functions. It is designed for Euclidean distance.
Euclidean distance however does not work well for high-dimensional data such as your time series (unless you have a really low sampling rate, say 24 months)
For time series, you will probably want to use a time series distance. There are quire a lot designed specifically for different kinds of time series. You really should look at these.
They won't work with k-means, but there are various distance and density-based cluster algorithms (where usually density is defined by distance!) that you should try. However, I have no idea what SPSS supports. I don't know if it has any time series distances, either.

- 39,639
- 7
- 61
- 96
-
Thankyou for your help @Anony-Mouse. I have 130 weeks of sample actually.. I'm bit scared with the size of my data, but let's see if that works. I'll follow your advice. Thanks! – Maria Oct 11 '12 at 10:58
-
Well, 1 measurement per week, or 1 measurement per second, that is what I'm trying to point out... – Has QUIT--Anony-Mousse Oct 11 '12 at 11:54
First of all, yes you can use k-means for cluster those time series. The default implementation of kmeans relies on the Euclidean distance, but can be modified to feed the algorithm with a specific time series distance, like DTW.
Check here for more information: On Clustering Multimedia Time Series Data Using K-Means and Dynamic Time Warping.
Second, i don't think you can use SPSS for those purposes, but i do know that you can use Matlab, there are plenty of implementations of kmeans and DTW avialable.

- 11
- 3