Clustering methods that take data order into account

Question

Is there any clustering methods that allows to take the time information (i.e. data order) into account ? That is, in addition to maximising intra-cluster similarity and minimising inter-cluster similarity, one could also maximise the "average time spent within a cluster" (or minimise "frequency of cluster changes"). I don't know how to do that since time (or data order) is not a spacial dimension.

As an a simple illustrative example, if we consider two clusters and we have three equally distant data points, then the two clusters may look differently depending on the order:

AAAAAABCBACBCCBBCB ===> We might want to group points A in the same cluster and points B,C in the other cluster

ABABABABAABCCCACCCCC ==> We might want to group points A,B in the same cluster and points C in the other cluster

You can use time as one of the variables taken into consideration in clustering, see this example: https://stats.stackexchange.com/questions/182232/fit-mixture-of-distributions-to-your-time-series-data-in-r/182354#182354 (it also discusses possible pitfails) — Tim, Mar 08 '16 at 15:18

score 1 · Accepted Answer · answered Mar 08 '16 at 15:03

Time-based clustering methods are beginning to get the attention they deserve. They can be distinguished based on whether or not they are moment-based vs non-moment based. Here are a few references:

Moment methods based on global statistics: Rob Hyndman's paper Dimension Reduction for Clustering Time Series Using Global Characteristics available here: http://www.robjhyndman.com/papers/wang2.pdf

Moment methods based on hidden markov models: Steve Scott's papers, e.g., Hidden Markov Models for Longitudinal Comparisons available here: https://sites.google.com/site/stevethebayesian/googlepageforstevenlscott/home

Oded Netzer's paper A Hidden Markov Model of Customer Relationship Dynamics available here: https://www0.gsb.columbia.edu/mygsb/faculty/research/pubfiles/2618/HMM%20of%20Customer%20Relationship%20Dynamics.pdf

Non-moment based methods, which are typically rooted in complexity and information theory:

Andreas Brandmaier's permutation distribution clustering, as well as his R modules, pdc: An R Package for Complexity-Based Clustering of Time Series available here: https://cran.r-project.org/web/packages/pdc/pdc.pdf

For an excellent overview of time series clustering that is now a few years old see Aggarwal and Reddy's book Data Clustering http://www.amazon.com/Data-Clustering-Algorithms-Applications-Knowledge/dp/1466558210/ref=sr_1_1?ie=UTF8&qid=1457449361&sr=8-1&keywords=reddy+data+clustering

score -1 · Answer 2 · answered Mar 08 '16 at 14:18

-1

You can use dynamic time wrapping distance function with hierarchical clustering algorithm. Here are some links:

answered Mar 08 '16 at 14:18

Kirill Dubovikov

164
1
3

Clustering methods that take data order into account

2 Answers2