I just wanted to clarify and make sure I understand overall about what PCA and LDA set out to achieve.
So: LDA is about trying to find a projection that best separates out different classes of data. It's supervised (since you obviously need the labels about which point belongs to which class), and while it doesn't gives you a way to classify points directly - you then can apply some classification technique to the new projected versions of the data. Each time you get a new datapoint, you project it and can use your learned discriminants / whatever you discovered when doing classification on the projected data.
PCA is not necessarily about classification at all. It's unsupervised. All it does is try to represent data in a (potentially lower) dimension for you while trying to preserve the important information within the data (which, by it's criteria, is variance).
Under certain conditions, PCA and LDA may coincide; but it definitely does not have to be the case. One is geared towards helping with classification, one is about making data more manageable.
Is all the above correct? Have I missed anything in this summary and/or do I have any misunderstandings? I really appreciate the help. Data science / Machine learning is not easy. Thanks!