1

Is there any clustering methods that allows to take the time information (i.e. data order) into account ? That is, in addition to maximising intra-cluster similarity and minimising inter-cluster similarity, one could also maximise the "average time spent within a cluster" (or minimise "frequency of cluster changes"). I don't know how to do that since time (or data order) is not a spacial dimension.

As an a simple illustrative example, if we consider two clusters and we have three equally distant data points, then the two clusters may look differently depending on the order:

AAAAAABCBACBCCBBCB ===> We might want to group points A in the same cluster and points B,C in the other cluster

ABABABABAABCCCACCCCC ==> We might want to group points A,B in the same cluster and points C in the other cluster

Tim
  • 108,699
  • 20
  • 212
  • 390
eLearner
  • 95
  • 1
  • 9
  • You can use time as one of the variables taken into consideration in clustering, see this example: https://stats.stackexchange.com/questions/182232/fit-mixture-of-distributions-to-your-time-series-data-in-r/182354#182354 (it also discusses possible pitfails) – Tim Mar 08 '16 at 15:18

2 Answers2

1

Time-based clustering methods are beginning to get the attention they deserve. They can be distinguished based on whether or not they are moment-based vs non-moment based. Here are a few references:

Moment methods based on global statistics: Rob Hyndman's paper Dimension Reduction for Clustering Time Series Using Global Characteristics available here: http://www.robjhyndman.com/papers/wang2.pdf

Moment methods based on hidden markov models: Steve Scott's papers, e.g., Hidden Markov Models for Longitudinal Comparisons available here: https://sites.google.com/site/stevethebayesian/googlepageforstevenlscott/home

Oded Netzer's paper A Hidden Markov Model of Customer Relationship Dynamics available here: https://www0.gsb.columbia.edu/mygsb/faculty/research/pubfiles/2618/HMM%20of%20Customer%20Relationship%20Dynamics.pdf

Non-moment based methods, which are typically rooted in complexity and information theory:

Andreas Brandmaier's permutation distribution clustering, as well as his R modules, pdc: An R Package for Complexity-Based Clustering of Time Series available here: https://cran.r-project.org/web/packages/pdc/pdc.pdf

For an excellent overview of time series clustering that is now a few years old see Aggarwal and Reddy's book Data Clustering http://www.amazon.com/Data-Clustering-Algorithms-Applications-Knowledge/dp/1466558210/ref=sr_1_1?ie=UTF8&qid=1457449361&sr=8-1&keywords=reddy+data+clustering

Mike Hunter
  • 9,682
  • 2
  • 20
  • 43
-1

You can use dynamic time wrapping distance function with hierarchical clustering algorithm. Here are some links: