Possible classification techniques to use when each feature is a probability distribution

Question

I am working with some data where the features have a temporal aspect (e.g. how often does a feature occur between $t_{begin}$ and $t_{end}$). I am trying to build a binary classifier for this data. The problem, however, is that each feature is a probability distribution. Normally, something like a SVM classifier would work: each object represented by a vector, where the $i^{th}$ index of the vector is a number (e.g. tf-idf, etc.).

I have no idea how to proceed where each object is represented by a set of probability distributions. The simple vector representation will clearly not be suitable. I've searched far and wide, but haven't found anything that suits this kind of data. Any pointers/ideas will be greatly appreciated.

Also, I wanted to add that I have looked into distances between distribution and density functions (e.g. Wasserstein metric, Kolmogorov-Smirnov statistic ), but I don't see how computing these metrics will help in the classification.

score 0 · Accepted Answer · answered Jun 22 '18 at 15:38

You could use distribution-level metrics (such as the one you are mentioning), and calculate all dissimilarities between all samples. This will result in a dissimilarity matrix, which you can use as input for your classifier. In fact, this dissimilarity matrix can be used as a Kernel matrix, to which you could for example apply a Linear SVM.

Possible classification techniques to use when each feature is a probability distribution

1 Answers1