1

I invented a formula on the relationships between the number of clusters and the maximum number of informative features. I mean informative features that are necessary to achieve perfect degree of separation between the clusters.

Assumption: I consider gaussian clusters only (any ellipse shape or circles).

I am considered the following cases. enter image description here From the above cases, I concluded that THE_MAXIMUM_NUMBER_OF_INFORMATIVE_FEATURES = NUMBER_OF_CLUSTERS - 1;

I suppose that the formula will work for 4 clusters, 5 clusters, 6 cluster and so on.

Question: Is my formula correct?

I appreciate any references on relevant papers.

  • It is [quite obvious](https://stats.stackexchange.com/a/190821/3277). If there is g convex shapes such as ellipsoids in p-dim space and they are of equal size and orientation (i.e. covariance matrices of these data clouds are same) then min(g-1, p') dimensions suffice to separate the clouds linearly from one another. p' is the rank of the total covariance matrix of the data. On your Fig1.2 p is 2 but p' approaches to be 1: single dimension (diagonal line) explains almost all data variation. – ttnphns Sep 16 '18 at 07:14
  • On Fig2.3, g=3, p=2, p'=1. min(g-1, p')=1 – ttnphns Sep 16 '18 at 07:21

0 Answers0