Deriving total (within class + between class) scatter matrix

Question

I was fiddling with PCA and LDA methods and I am stuck at a point, I have a feeling that it is so simple that I can't see it.

Within-class ($S_W$) and between-class ($S_B$) scatter matrices are defined as:

$$ S_W = \sum_{i=1}^C\sum_{t=1}^N(x_t^i - \mu_i)(x_t^i - \mu_i)^T $$

$$ S_B = \sum_{i=1}^CN(\mu_i-\mu)(\mu_i-\mu)^T $$

Total scatter matrix $S_T$ is given as:

$$ S_T = \sum_{i=1}^C\sum_{t=1}^N(x_t^i - \mu)(x_t^i - \mu)^T = S_W + S_B $$

where C is number of classes and N is number of samples $x$ are samples, $\mu_i$ is ith class mean, $\mu$ is overall mean.

While trying to derive $S_T$ I came up to a point where I had:

$$ (x-\mu_i)(\mu_i-\mu)^T + (\mu_i-\mu)(x-\mu_i)^T $$

as a term. This needs to be zero, but why?

Indeed:

\begin{align} S_T &= \sum_{i=1}^C\sum_{t=1}^N(x_t^i - \mu)(x_t^i - \mu)^T \\ &= \sum_{i=1}^C\sum_{t=1}^N(x_t^i - \mu_i + \mu_i - \mu)(x_t^i - \mu_i + \mu_i - \mu)^T \\ &= S_W + S_B + \sum_{i=1}^C\sum_{t=1}^N\big[(x_t^i - \mu_i)(\mu_i - \mu)^T + (\mu_i - \mu)(x_t^i - \mu_i)^T\big] \end{align}

The answer is that you are summing the deviations of values around their mean and that sum is zero. But what, precisely, are $x$, $m$, and $m_i$? How are $m$ and $m_i$ related to $\mu$ and $\mu_i$? The quality of answers will depend on how accurately we guess but you're forcing us to do an awful lot of guessing! — whuber, Mar 22 '11 at 17:35

score 9 · Accepted Answer · answered Mar 23 '11 at 14:12

9

If you assume

$$\frac{1}{N}\sum_{t=1}^Nx_t^{i}=\mu_i$$

Then

$$\sum_{i=1}^C\sum_{t=1}^N(x_t^i-\mu_i)(\mu_i-\mu)^T=\sum_{i=1}^C\left(\sum_{t=1}^N(x_t^i-\mu_i)\right)(\mu_i-\mu)^T=0$$

and formula holds. You deal with the second term in the similar way.

answered Mar 23 '11 at 14:12

mpiktas

33,140
5
82
138

3

(+1) The second term, being the transpose of the first, must also be zero :-). – whuber Mar 23 '11 at 15:57
@whuber, yes, that too :) – mpiktas Mar 23 '11 at 17:16
Hi,i don't get why the assumption holds?Can someone explain that? – bespectacled Jan 07 '19 at 09:48
1

@Mvkt It is not so much an assumption as the definition of $\mu_i$ I suppose. That is to say: $\mu_i$ is the mean of the observations in group $i$. I expect the answer uses 'assume' because the OP doesn't explain the notation, so we have to guess that the group mean is meant by $\mu_i$. – Vincent Jan 21 '19 at 12:20

Deriving total (within class + between class) scatter matrix

1 Answers1

Linked

Related