1

I have two sets of vectors and want to find a differentiable measure that can help quantify/approximate the degree of separability of the two sets. This metric might correlate well with the performance of a RF trained to separate the data.

Looking online I found Bhattacharya distance, which looks to be what I want but applied to distributions. According to the Wikipedia: It is used to measure the separability of classes in classification. I tried using this metric but unfortunately due to the high dimensionality of my vectors the sample covariance matrices are singular and lead to undefined results.

Any suggestions for what metric I might be able to use instead?

1 Answers1

0

My naive guess would be to construct a norm of the vectors. Now, you have distribution of norm of these two sets of vectors. You can use Bhattacharya distance/coefficients as defined $BC(p,q) = \sum_{1}^{N} \sqrt{p_iq_i} $ (https://www.wikiwand.com/en/Bhattacharyya_distance). Here $p_i$s and $q_i$s will be probability density of norm distribution corresponding two set of vectors. Your BC would depend on how you construct the norm of the vectors and whether that is suitable for your problem.