Validating a formula on the relationships between the number of clusters and the maximum number of informative features

Question

I invented a formula on the relationships between the number of clusters and the maximum number of informative features. I mean informative features that are necessary to achieve perfect degree of separation between the clusters.

Assumption: I consider gaussian clusters only (any ellipse shape or circles).

I am considered the following cases. From the above cases, I concluded that THE_MAXIMUM_NUMBER_OF_INFORMATIVE_FEATURES = NUMBER_OF_CLUSTERS - 1;

I suppose that the formula will work for 4 clusters, 5 clusters, 6 cluster and so on.

Question: Is my formula correct?

I appreciate any references on relevant papers.

It is [quite obvious](https://stats.stackexchange.com/a/190821/3277). If there is g convex shapes such as ellipsoids in p-dim space and they are of equal size and orientation (i.e. covariance matrices of these data clouds are same) then min(g-1, p') dimensions suffice to separate the clouds linearly from one another. p' is the rank of the total covariance matrix of the data. On your Fig1.2 p is 2 but p' approaches to be 1: single dimension (diagonal line) explains almost all data variation. — ttnphns, Sep 16 '18 at 07:14

Validating a formula on the relationships between the number of clusters and the maximum number of informative features

0 Answers0