3

I am interested in utilizing machine learning clustering algorithms on a count dataset that:

  • Takes account of Sparsity (excess of zeros)
  • Takes into account a complex interrelationship like ecological and microbiological relations
  • Clusters (then classify eventually) observations coming from tensors: observation x variable x time rather than matrices observation x variable

I first tried implementing zero inflated models in R via package ZIP, then CaDENCE because it uses a neural network to model the conditional density of a ZI. But the problem is, it has to:

  1. be a multivariate model, and
  2. it has to take into account the interrelationship

So, I'm not sure it can fit. Thus, how can I apply machine learning on sparse interrelated count data?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Mr Micro
  • 27
  • 2
  • Can you please give a concrete example of how a few of your data-points look like? Some details about their nature and/or collection mechanism won't go unappreciated too) I think you are describing a each observation as having some scalar and then some "time-series"-like characteristics. Maybe you want to check the thread [here](http://stats.stackexchange.com/questions/198061) so you can refine the definition of your problem. – usεr11852 Apr 08 '17 at 11:17
  • Thank you for your reply user11852. my data has several features, (258 variables) and 649 observations. The values are very sparse with low variability and each observation is labeled with a time point between 1 and 5, and a class from { A, B, C,D }. – Mr Micro Apr 10 '17 at 06:35

0 Answers0