Does LDA need normality assumption for its dimensionality reduction?

Question

I cannot understand where we use the normal assumption of sample in LDA for the dimension reduction. Since LDA only minimizes the within-class scatter $w^TS_ww:$ $$S^i_w = \sum\limits_{x\in C_i}(x - \mu_i)(x - \mu_i)^T;\ S_w = \sum\limits_{i = 1}^kS^i_w.$$ and maximize between-class scatter $w^TS_bw:$

$$S_b = S_t - S_w = \sum\limits_i|C_i|(\mu_i-\mu)(\mu_i-\mu)^T$$ $$S_t = \sum\limits_{x\in C}(x - \mu)(x - \mu)^T.$$

Here $C:$ total samples; $\mu:$ mean of total sample; $C_i:$ samples of $i$th-class, $i = 1,\cdots,k;$ $\mu_i:$ mean of $i$th-class.

I think the definitions of $S_w$ and $S_t$ are true for all distributions. And in PCA, the normal assumption of sample is to guarantee uncorrelation is equivalent to independent, but where is normal used here? And someone also assume the distributions of any two class are independent in LDA.

I found the wording in the wiki of LDA:

The terms Fisher's linear discriminant and LDA are often used interchangeably, although Fisher's original article[1] actually describes a slightly different discriminant, which does not make some of the assumptions of LDA such as normally distributed classes or equal class covariances.

So is what I read the Fisher's linear discriminant?

Wikipedia answers your question directly in its LDA article at https://en.wikipedia.org/wiki/Linear_discriminant_analysis#Assumptions. — whuber, Dec 17 '20 at 21:34
LDA needs normality assumption for its classification stage, not the extraction stage (https://stats.stackexchange.com/q/71489/3277), because it uses normal pdf to classify. However, ellipsoid, more or less symmetrically distributed classes (not necessarily normal) of the same shape are an assumption of the extraction (dim. reduction) stage too, because only then they are well linearly separable by functions such as linear discriminants. — ttnphns, Dec 18 '20 at 09:29
Is there a formula to maximize $S_b/S_w$ ? I know Fisher's method for two classes (and he used it also for 3 classes but only by using a specific linear combination of the 3 classes). Is there any way to do it with 3 or more classes (without using some iterative optimizer)? — Sextus Empiricus, Dec 18 '20 at 10:02
With multiple classes (or any 2+ number), we use [canonical LDA](https://stats.stackexchange.com/a/48859/3277), and this is what we mean by default when we hear "LDA". This utilizes solving an asymmetric eigen problem some way. — ttnphns, Dec 18 '20 at 10:17

Does LDA need normality assumption for its dimensionality reduction?

0 Answers0