Suppose I have $n$ samples from a multivariate distribution $\xi = (\xi_1,...,\xi_l)$ or from $\zeta = (\zeta_1, ..., \zeta_l)$. In case if they are multivariate normal distributions, I can estimate matrix of covariances from the data and perform LDA on my samples.
The question is what should I do if I want to perform something like LDA, but for arbitrary distributed RVs? The only thing that I need is to get a likelihood ratio between classes $\xi$ and $\zeta$, is there a way to find it without defining an analogue to Mahalanobis distance? As I understood from the similar question, defining that distance is tricky for non-normal cases - but is it possible doing numerical integration of kernel density estimated (sorry for naive thoughts) joint probability function? How to perform it in practice? Is it a good idea?
My needs can be expressed as "build a binary classification method that can take into account dependencies between features and output likelihood ratio for each data point in the end".