0

In a course it was stated that the boundaries determined by linear discriminant analysis are the same as the boundaries determined by linear regression models if we seperate two classes that are of the same size.

Intuitively, the statement is clear. However, I don't manage to proof or understand it formally/ on the mathematical point of view.

In discriminant analysis, the boundaries are determined by taking the log on $P(x|Y=Class_c, \theta)$ and equalise it with $P(x|Y=Class_{c_1}, \theta)$, i.e. linear discriminant analysis determines all points for the two classes $Class_c$ and $Class_{c_1}$ for which $$ \{x \in \mathrm{IR}^p: P(x|Y=Class_{c_1}, \theta) = P(x|Y=Class_c, \theta)\}$$ where $p$ is the number of features in the data set and $P(x|Y=Class_{c_1}, \theta)$ ~ $N(x|\mu_{Class_{c_1}}, \Sigma_{Class_{c_1}})$

In linear regression, the two models are described by $$P(Y_{Class_{c_1}}|x,\theta) \sim N(y|x^T\beta_{Class_{c_1}},\sigma^2)$$

But how is it now evident that these boundaries fall together? And how are the "equal sized training sets" considered in the proof? I don't really know how to put this together...

Thanks four your help and thought-provoking impulses! :)

  • 1
    Do not see LDA as a "classification" (i.e., a boundary problem), see it as a prediction problem. Then this two might be helpful: https://stats.stackexchange.com/q/31459/3277. Also: https://stats.stackexchange.com/q/169436/3277. – ttnphns Jul 05 '20 at 09:42
  • 1
    The key pont there is: LDA with any k classes can be seen as a particular case of the canonical correlation analysis (CCA), and also as of the reduced rank regression (RRR). When k=2 both CCA and RRR turns out to be the usual multiple linear regression. – ttnphns Jul 05 '20 at 09:48
  • Thank you! :-) Unfortunately, we did not address RRR, only the usual linear regression with Lasso/ Ridge Shrinkage and Elastic net... However, we considered always the probabilistic point of view. That is why I tried to start the proof sketch also from this direction –  Jul 05 '20 at 10:16
  • 1
    You see, linear regression (unlike logistic regression) is not based on a probabilistic p.o.v. Thus it makes little sense to "translate" LDA, understood as a classification, to lin. reg. In my answers in the links, I express it twice, the idea that LDA in general case of k classes becomes clearly a _two stage_ procedure: dim. reduction (or prediction) and subsequent classification. The affinity between LDA and regression lies in that first stage, and it is through the notion of canonical correlations (CCA). – ttnphns Jul 05 '20 at 10:28
  • 1
    So, what I might recommend to you now: expand your study onto LDA with k classes (the canonical LDA). When you know the topic, return to the Q of the relation to lin.reg. / CCA. – ttnphns Jul 05 '20 at 10:35

0 Answers0