1

I perform clustering of time series into k=[2,N] number of clusters by using either DTW+kmedoids or DTW+single linkage+hierarchical clustering (HC), as advised in a previous post: Dynamic Time Warping Clustering

Regarding the evaluation of optimum number of clusters, I want to use expectation–maximization (EM) Gaussian Mixture Models (GMM) and determine the k that maximizes the log-likelihood for each approach.

My questions are:

  • What should be the input dataset in EM/GMM? The DTW similarity distance or the raw dataset?
  • Can the initializations in EM/GMM be the clusters centers of either the kmedoids or randomly selected seeds from the HC clusters?
  • Should I run k-fold cross-validation of the input dataset with the EM/GMM and return the average log-likelihood value?
user26872
  • 35
  • 1
  • 6
  • In fact..based on [this paper](http://DOI:%20http://dx.doi.org/10.1109/TSG.2015.2409786) you can cluster time series with GMM/EM. I am working on how to implement with the well-known scikit-learn tools however. For an evaluation of hte optimal number of clusters, BIC/AIC are what you want to observe. These are also in the library for convenience. The [link](https://jakevdp.github.io/PythonDataScienceHandbook/05.12-gaussian-mixtures.html) contains a great summary of how to use GMM/EM on certain data but not time series unfortunately. I'd like to hear of your updates as well. – dia Mar 11 '18 at 15:29

1 Answers1

1

Gaussian Mixture Modeling assumes ypur input data are coordinates in $R^d$ and contain Gaussian-shaped clusters. I don't think you can use this on time series. Also, GMM is a clustering approach on its own, and I don't see how you could use that to evaluate other clusterings.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
  • There was some research on energy data for outliers detection with time-series, which they used EM/GMM in this [paper](https://www.mdpi.com/1996-1073/8/11/12337) in cycled after applying z-score. However, I tried to re-implement it on similar energy data unsuccessfully. They amid to refill the gaps by this approach. – Mario Dec 04 '20 at 11:30