I am interested in utilizing machine learning clustering algorithms on a count dataset that:
- Takes account of Sparsity (excess of zeros)
- Takes into account a complex interrelationship like ecological and microbiological relations
- Clusters (then classify eventually) observations coming from tensors: observation x variable x time rather than matrices observation x variable
I first tried implementing zero inflated models in R
via package ZIP
, then CaDENCE
because it uses a neural network to model the conditional density of a ZI
. But the problem is, it has to:
- be a multivariate model, and
- it has to take into account the interrelationship
So, I'm not sure it can fit. Thus, how can I apply machine learning on sparse interrelated count data?