Linear Discriminant Analysis Newbie question

Question

Does linear discriminant analysis always project the points to a line? Most of the graphical illustrations of LDA that I see online use an example of 2 dimensional points which are projected onto a straight line y=mx+c. If the points were each a 10-dimensional vector, does LDA still project them to a line?

Or would it project them to a hyperplane with 9 dimensions or less.

ANother question about projections: If I have a vector Y=[a,b,c,d]. The projection of this vector onto a given line is the product of the direction vector V of the line and the vector Y. This is equivalent to a dot product given by transpose(V).Y, and gives just one number (a scalar).

This seems to be the way how LDA works. So, if I may ask, does LDA map a full n-dimensional vector onto a scalar (a singe number)?

Apologies in advance for my newbie question.

If you have n dimensional data, LDA reduces it to at most n-1. In 2D example, the only reduction is 1D, hence a line. — Vladislavs Dovgalecs, Aug 18 '15 at 15:17
@cbeleites Yes, you are right! This about classes of the data, not the dimensionality! — Vladislavs Dovgalecs, Aug 18 '15 at 16:13
@VladislavsDovgalecs Can I know that in the context of dimensionality reduction using LDA/FDA. `LDA/FDA can start with n dimensions and end with k dimensions, where k < n`. Is that correct? Or The output is `c-1 where c is the number of classes and the dimensionality of the data is n with n>c.` — aan, May 06 '20 at 21:25

score 3 · Answer 1 · answered Aug 02 '16 at 15:19

3

LDA seeks to reduce dimensionality while preserving as much of the class discriminatory information as possible. Assume we have a set of $d$-dimensional observations $X$, belonging to $C$ different classes. The goal of LDA is to find an linear transformation (projection) matrix $L$ that converts the set of labelled observations $X$ into another coordinate system $Y$ such that the class separability is maximized. The dataset is transformed into the new subspace as:

\begin{equation} Y = XL \end{equation}

The columns of the matrix $L$ are a subset of the $C-1$ largest (non-orthogonal) eigenvectors of the squared matrix $J$, obtained as:

\begin{equation} J = S_{W}^{-1} S_B \end{equation}

where $S_W$ and $S_B$ are the scatter matrices within-class and respectively between-classes.

When it comes to dimension reduction in LDA, if some eigenvalues have a significantly bigger magnitude than others then we might be interested in keeping only those dimensions, since they contain more information about our data distribution. This becomes particularly interesting as $S_B$ is the sum of $C$ matrices of rank $\leq 1$, and the mean vectors are constrained by $\frac{1}{C}\sum_{i=1}^C \mu_i = \mu$ \cite{c.radhakrishnarao1948}. Therefore, $S_B$ will be of rank $C-1$ or less, meaning that there are only $C-1$ eigenvalues that will be non-zero (more info here). For this reason, even if the dimensionality $k$ of the sub-space $Y$ can be arbitrarily chosen, it does not make any sense to keep more than $C-1$ dimensions, as they will not carry any useful information. In fact, in \ac{lda} the smallest $d - (C-1)$ dimensions have magnitude zero, and therefore the subspace $Y$ should have exactly $k = C-1$ dimensions.

answered Aug 02 '16 at 15:19

Renthal

326
1
7

is that correct that: `Let say my original dataset has 2 classes, the output will be 1 dimensionality ( 2 – 1 =1 )`, likewise, if `my original dataset has 5 classes, the output will be 4 dimensionality.` – aan May 08 '20 at 16:28
If you choose `L` to contain only the non-zero eigenvectors (meant as those eigenvectors whose corresponding eigenvalue is non-zero), yes, correct. – Renthal May 11 '20 at 08:05
thanks. So I can chooce any output I want for LDA, but the problem is the eigenvalues for dimensions `> Class - 1` will be `imaginary or zero` (eigenpair) which is no meaning. Is that correct? – aan May 11 '20 at 10:37
1

Your definition is imprecise. You can yes choose any output you want (provided it is smaller or equal to $d$, with the notation above), however, the linear separability of a space with dimension $x$ such that $C-1 < x \leq d$ is not going to be any better than another space with dimension $y = C-1$. The eigervectors with zero (not sure where you get the imaginary part into play?) eigenvalue are in the $J$ matrix, not in in the final space $Y$. Hope it helps. – Renthal May 12 '20 at 11:52
thanks. can you explain in simple English. Couldn't understand it deeply. But for better separation, is the best to have output is `C-1`. Am I correct? – aan May 12 '20 at 19:13
For better separation is best to have output to be $C - 1$. Having more, is not harming but also not helping either. – Renthal May 14 '20 at 12:05
thanks. How can I explained or any proof reference I can said `Having more than C-1 will not helping in separation`? – aan May 14 '20 at 12:08
1

Because you have only $C-1$ non-zero eigenvalues in matrix $J$. – Renthal May 14 '20 at 12:27
thanks a lot. very helpful reply. – aan May 14 '20 at 12:33
are you familiar with standarised a data https://stats.stackexchange.com/questions/466460/what-is-the-meaning-of-standardization-in-lda-fda – aan May 14 '20 at 13:49
can i get the full reference for `\cite{c.radhakrishnarao1948}.` which in your text above? I couldn't find this paper. – aan May 23 '20 at 09:18
1

The Utilization of Multiple Measurements in Problems of Biological Classification, C. Radhakrishna Rao, Journal of the Royal Statistical Society. Series B (Methodological), 1948, http://www.jstor.org/stable/2983775 – Renthal May 25 '20 at 07:53
thanks for the references. Are you familiar with LDA having small sample size problem? I would be appreciate if you can advice here https://stats.stackexchange.com/questions/468095/linear-discriminant-analysis-have-small-sample-size-problem-sss-is-it-nd – aan May 25 '20 at 08:39

score 2 · Answer 2 · edited Apr 13 '17 at 12:44

LDA project to (at most) $n_{classes} - 1$ dimensions, so binary (2-class) LDA reduces to 1D (= onto line).
10 classes would lead to a 9D projection (as long as X is at least 9D, of course).

does LDA map a full n-dimensional vector onto a scalar (a singe number)? Not always, see above.

For more details on what the projection step does, see e.g. https://stats.stackexchange.com/a/87509/4598

(Obviously, if you code your classes as numbers then the final class prediction will be a single number)

Linear Discriminant Analysis Newbie question

2 Answers2

Linked