0

Background

My system tries to classify among three classes. At first, my labeling for CCA had a single dimension {1, 2, 4}, but then I found out that to get more components, I need more dimensions in Y: as dim Y = 1, I could only set n_components = 1.

So, I switched to OneHot labeling instead {[0 0 1],[0 1 0],[1 0 0]} (dim Y = 3) and the CCA works fine with n_components <= 3.

To improve my results (pretty mediocre right now), I tried increasing my number of components to at least the number of classes + 1. So I changed dim Y to 4: {[0 0 0 1],[0 0 1 0],[0 1 0 0]}. Now I get this error randomly:

y_score = next(col for col in Y.T if np.any(np.abs(col) > eps))

which probably means "Yk is full of zeros, which means that Yk was successively deflated to a matrix of rank 0... which means that we asked for too many components, maybe?".

Question

Overall, I want to know how is the number of classes correlated to the number of components. Do I need to have n_components =< n_classes? Can I increase my number of components without "deflating the martix to rank 0"?

mgmussi
  • 1
  • 3

1 Answers1

0

CCA can give you only as many components as the number of variables in X or Y (whatever is smaller), basically for the same reason why PCA will give you only as many components as you have variables and not more. By adding zeros or using one-hot encoding, you are only adding colinear columns, so you are not actually increasing the rank of your matrix and the CCA solution is not unique.

rep_ho
  • 6,036
  • 1
  • 22
  • 44
  • But how can I improve my labeling without using OneHot encoding, then? If I just use the unidimensional {1, 2, 4} it returns `FutureWarning: As of version 0.24, n_components(3) should be in [1, min(n_features, n_samples, n_targets)] = [1, 1]. n_components=1 will be used instead. In version 1.1 (renaming of 0.26), an error will be raised.` (I', already using v0.24, so it simply crashes) – mgmussi Apr 23 '21 at 17:13