I cannot understand where we use the normal assumption of sample in LDA for the dimension reduction. Since LDA only minimizes the within-class scatter
$w^TS_ww:$
$$S^i_w = \sum\limits_{x\in C_i}(x - \mu_i)(x - \mu_i)^T;\ S_w = \sum\limits_{i = 1}^kS^i_w.$$
and maximize between-class scatter
$w^TS_bw:$
$$S_b = S_t - S_w = \sum\limits_i|C_i|(\mu_i-\mu)(\mu_i-\mu)^T$$ $$S_t = \sum\limits_{x\in C}(x - \mu)(x - \mu)^T.$$
Here $C:$ total samples; $\mu:$ mean of total sample; $C_i:$ samples of $i$th-class, $i = 1,\cdots,k;$ $\mu_i:$ mean of $i$th-class.
I think the definitions of $S_w$ and $S_t$ are true for all distributions. And in PCA, the normal assumption of sample is to guarantee uncorrelation is equivalent to independent, but where is normal used here? And someone also assume the distributions of any two class are independent
in LDA.
I found the wording in the wiki of LDA:
The terms Fisher's linear discriminant and LDA are often used interchangeably, although Fisher's original article[1] actually describes a slightly different discriminant, which does not make some of the assumptions of LDA such as normally distributed classes or equal class covariances.
So is what I read the Fisher's linear discriminant
?