16

I was fiddling with PCA and LDA methods and I am stuck at a point, I have a feeling that it is so simple that I can't see it.

Within-class ($S_W$) and between-class ($S_B$) scatter matrices are defined as:

$$ S_W = \sum_{i=1}^C\sum_{t=1}^N(x_t^i - \mu_i)(x_t^i - \mu_i)^T $$

$$ S_B = \sum_{i=1}^CN(\mu_i-\mu)(\mu_i-\mu)^T $$

Total scatter matrix $S_T$ is given as:

$$ S_T = \sum_{i=1}^C\sum_{t=1}^N(x_t^i - \mu)(x_t^i - \mu)^T = S_W + S_B $$

where C is number of classes and N is number of samples $x$ are samples, $\mu_i$ is ith class mean, $\mu$ is overall mean.

While trying to derive $S_T$ I came up to a point where I had:

$$ (x-\mu_i)(\mu_i-\mu)^T + (\mu_i-\mu)(x-\mu_i)^T $$

as a term. This needs to be zero, but why?


Indeed:

\begin{align} S_T &= \sum_{i=1}^C\sum_{t=1}^N(x_t^i - \mu)(x_t^i - \mu)^T \\ &= \sum_{i=1}^C\sum_{t=1}^N(x_t^i - \mu_i + \mu_i - \mu)(x_t^i - \mu_i + \mu_i - \mu)^T \\ &= S_W + S_B + \sum_{i=1}^C\sum_{t=1}^N\big[(x_t^i - \mu_i)(\mu_i - \mu)^T + (\mu_i - \mu)(x_t^i - \mu_i)^T\big] \end{align}

amoeba
  • 93,463
  • 28
  • 275
  • 317
nimcap
  • 413
  • 4
  • 8
  • 2
    The answer is that you are summing the deviations of values around their mean and that sum is zero. But what, precisely, are $x$, $m$, and $m_i$? How are $m$ and $m_i$ related to $\mu$ and $\mu_i$? The quality of answers will depend on how accurately we guess but you're forcing us to do an awful lot of guessing! – whuber Mar 22 '11 at 17:35
  • @whuber: You are totally right, I revised my question. – nimcap Mar 23 '11 at 08:46

1 Answers1

9

If you assume

$$\frac{1}{N}\sum_{t=1}^Nx_t^{i}=\mu_i$$

Then

$$\sum_{i=1}^C\sum_{t=1}^N(x_t^i-\mu_i)(\mu_i-\mu)^T=\sum_{i=1}^C\left(\sum_{t=1}^N(x_t^i-\mu_i)\right)(\mu_i-\mu)^T=0$$

and formula holds. You deal with the second term in the similar way.

mpiktas
  • 33,140
  • 5
  • 82
  • 138
  • 3
    (+1) The second term, being the transpose of the first, must also be zero :-). – whuber Mar 23 '11 at 15:57
  • @whuber, yes, that too :) – mpiktas Mar 23 '11 at 17:16
  • Hi,i don't get why the assumption holds?Can someone explain that? – bespectacled Jan 07 '19 at 09:48
  • 1
    @Mvkt It is not so much an assumption as the definition of $\mu_i$ I suppose. That is to say: $\mu_i$ is the mean of the observations in group $i$. I expect the answer uses 'assume' because the OP doesn't explain the notation, so we have to guess that the group mean is meant by $\mu_i$. – Vincent Jan 21 '19 at 12:20