3

According to Bishop's Machine Learning and Pattern Recognition, the cost function for linear discriminant analysis (LDA) with $K>2$ classes is $$J(\mathbf w) = \mathrm{Tr}\left\{\left(\mathbf W \mathbf S_W \mathbf W^T\right)^{-1} \left(\mathbf W \mathbf S_B \mathbf W^T\right)\right\}.$$ It is stated in the text:

$\mathbf{S}_\mathrm B = \sum_{k=1}^K N_k(\mathbf m_k - \mathbf m)(\mathbf m_k - \mathbf m)^T$ is composed of the sum of $K$ matrices, each of which is an outer product of two vectors and therefore of rank 1. In addition, only $(K-1)$ of these matrices are independent and as a result of the constraint $$\mathbf m = \frac{1}{N}\sum_{n=1}^N \mathbf x_n = \frac{1}{N} \sum_{k=1}^K N_k \mathbf m_k.$$ Thus, $\mathbf S_\mathrm B$ has rank at most equal to $(K-1)$ and so there are at most $(K-1)$ nonzero eigenvalues. This shows that the projection onto the $(K-1)$-dimensional subspace spanned by the eigenvalues of $\mathbf S_\mathrm B$ does not alter the value of $J(\mathbf w)$, and so we are therefore unable to find more than $(K-1)$ linear 'features' by this means (Fukunaga, 1990).

Question

  1. Why does the constraint cause the matrix to have only $(K-1)$ matrices?
  2. I can't see how the projection does not alter the value of $J(\mathbf w)$ and therefore we are unable to find $(K-1)$ linear features.
  3. What does it imply when we can't find $(K-1)$ linear features? I can't see why the author is telling me this.

(Can you please make the math explicit? I find the textbook rather hard to understand because many math details are left out).


enter image description here enter image description here

amoeba
  • 93,463
  • 28
  • 275
  • 317
user10024395
  • 1
  • 2
  • 11
  • 20

0 Answers0