0

Resources about LDA usually say the number of components is bounded by the number of classes - 1. E.g, in the binary case, only one component can be found.

In LDA, the first discriminant direction $\Phi_1$ is calculated as argmax of $\frac{\Phi_1^T S_b \Phi_1}{\Phi_1^T S_w \Phi_1}$ where $S_b$ and $S_w$ are the between-class and within-class covariance matrices, respectively. Why can't we continue this way and compute the $i$th direction $\Phi_i$ to be the argmax of $\frac{\Phi_i^T S_b \Phi_i}{\Phi_i^T S_w \Phi_i}$ under the constraint of orthogonality to $\Phi_1, \Phi_2 \dots \Phi_{i-1}$, as is done in PCA?

Ostenbily, in the binary case, where each $\Phi_i$ is a vector, one can do it $n$ times, if the inputs are $n$ dimensional vectors.

conjectures
  • 3,971
  • 19
  • 36
user1767774
  • 227
  • 1
  • 8
  • 1
    https://stats.stackexchange.com/a/190821/3277 "Then q=g−1=2 independent dimensions _will suffice_ to predict the class membership as precisely as formerly" – ttnphns Feb 02 '20 at 12:23
  • If I understand this answer correctly, $c-1$ directions ($c$ is the number of classes) are enough to maximize accuracy under the normality and same-covariance assumption, but if one wants to perform dimensionality reduction, one can continue calculating all directions $\Phi_1, \Phi_2 \dots \Phi_n$? – user1767774 Feb 02 '20 at 12:48
  • If by dimensionality reduction you mean PCA, then yes you might extract up to n components. But if you mean dim. reduction by means of linear discriminants then their max. number is min(n, c-1). This is the dimensionality spanned by $Sw^{-1}Sb$ matrix. – ttnphns Feb 02 '20 at 13:04
  • I meant dimensionality reduction via the LDA criterion. Can you please elaborate on the last sentence - what do you mean by the dimensionality spanned by $S_w^{-1}S_b$? if I have 2 classes, can't I find the direction $\Phi_2$ which is the *second best* direction in minimizing the LDA loss? – user1767774 Feb 02 '20 at 14:04
  • The number of nonzero eigenvalues of the aforementioned matrix is no greater than min(n, c-1). You could compute any number of eigenvectors, but all those corresponding to nonexistent eigenvalues (dimensions) are just numeric rubbish. – ttnphns Feb 02 '20 at 14:35
  • Algebra of LDA https://stats.stackexchange.com/a/48859/3277 – ttnphns Feb 02 '20 at 14:37
  • 1
    `which is the second best direction in minimizing the LDA loss?` Please return to my first link. If you have 2 data clouds of identical cov matrices (I.e. identical shape and space orientation) there is _no_ "LDA loss" beyond the single dimension. One dimension suffice. LDA "loss" is _separability loss_, not variability loss like of PCA. – ttnphns Feb 02 '20 at 15:08
  • OK, thanks. Can you please share a reference regarding the number of nonzero eigenvalues of $S_w^{-1}S_b$? – user1767774 Feb 02 '20 at 15:55
  • 2
    To follow your notation, n is the number of variables, c the number of groups. Then rank of Sw is (if no multicollinearity)=n and rank of Sb is (when data is centered, and LDA centers data) is c-1. Hence rank of $S_w^{-1}S_b$ is min (n,c-1). – ttnphns Feb 02 '20 at 16:39
  • Understood, thanks! – user1767774 Feb 02 '20 at 21:12

1 Answers1

0

The rank of between-class scatter matrix $S_B$ for the whole data set is at most $c-1$. ($c$ is the number of classes.) The individual between-class scatter matrix $S_{Bi}$ for one class is at most $1$. The former matrix is the weighted sum of the latter.

Since $rank(AB)\le{min(rank(A), rank(B))}$, you have $rank(S^{-1}_WS_B)\le{rank(S_B)}\le{c-1}$

seralouk
  • 90
  • 10