0

I just wanted to clarify and make sure I understand overall about what PCA and LDA set out to achieve.

So: LDA is about trying to find a projection that best separates out different classes of data. It's supervised (since you obviously need the labels about which point belongs to which class), and while it doesn't gives you a way to classify points directly - you then can apply some classification technique to the new projected versions of the data. Each time you get a new datapoint, you project it and can use your learned discriminants / whatever you discovered when doing classification on the projected data.

PCA is not necessarily about classification at all. It's unsupervised. All it does is try to represent data in a (potentially lower) dimension for you while trying to preserve the important information within the data (which, by it's criteria, is variance).

Under certain conditions, PCA and LDA may coincide; but it definitely does not have to be the case. One is geared towards helping with classification, one is about making data more manageable.

Is all the above correct? Have I missed anything in this summary and/or do I have any misunderstandings? I really appreciate the help. Data science / Machine learning is not easy. Thanks!

ttnphns
  • 51,648
  • 40
  • 253
  • 462
  • 1
    Yes, that genarally sounds correct. And most of the time the pr. components and the discriminants don't coincide https://stats.stackexchange.com/q/12861/3277; https://stats.stackexchange.com/q/22884/3277 – ttnphns Dec 20 '20 at 18:56
  • @ttnphns thank you! I appreciate it. I just get confused because many different sources on the topic give different views / answers and it's hard sometimes to get the "correct" overall picture of what's going on. It's nice to have some validation. Thanks! – Riemann'sPointyNose Dec 20 '20 at 19:06
  • Some stat packages offer canonical discriminant analysis which is similar to PCA but, as noted, supervised and post-hoc wrt the classifications. – Mike Hunter Dec 20 '20 at 20:25

0 Answers0