2

I am trying to understand how linear discriminant analysis (LDA) is related to principal component analysis (PCA) and k-means clustering method. As an example, here is a comparison between PCA and k-means:

enter image description here

My question is how LDA is related to PCA and k-means?

amoeba
  • 93,463
  • 28
  • 275
  • 317
rdorlearn
  • 3,493
  • 6
  • 26
  • 29
  • 2
    Where is the snapshot taken from? By the way, it does not look much of a *comparison* between k-means and PCA to me; it shows two different formulas, yes, but how do they compare?.. – amoeba Feb 06 '15 at 17:55
  • 2
    See also: http://stats.stackexchange.com/questions/23353/pca-lda-cca-and-pls, http://stats.stackexchange.com/a/87509/4598 – cbeleites unhappy with SX Feb 07 '15 at 14:41

1 Answers1

1

I'm by no means an expert in the topic, but it seems that K-means clustering can be viewed as a dimensionality reduction technique, of which LDA and PCA are direct examples. Clustering via K-means seems to uncover the latent structure of data, which essentially results in dimensionality reduction. I'm sure that other people will provide some more advanced answers to this question.

Additionally, I would like to share two references that are relevant to the question/topic and IMHO are rather comprehensive. One reference is a highly-cited research paper by Ding and He (2004) on the relationship between K-means and PCA techniques. Another reference is a research paper by Martinez and Kak (2001), presenting the comparison between PCA and LDA techniques.

References

Ding, C., & He, X. (2004, July). K-means clustering via principal component analysis. In Proceedings of the twenty-first International Conference on Machine Learning (p. 29). ACM.

Martínez, A. M., & Kak, A. C. (2001). PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 228-233.

Aleksandr Blekh
  • 7,867
  • 2
  • 27
  • 93
  • 1
    See stackoverflow.com/a/29731291/2056067 for an attempt to summarize Ding & He. :-) – A. Donda May 30 '15 at 16:25
  • @A.Donda: Thank you for the link. :-) Both answers are very nice (+1) and I will re-read them, when I'll have a bit more time. However, I think that particular question belongs to either _Cross Validated_, or _Data Science_ SE site and, therefore, should be migrated. – Aleksandr Blekh May 30 '15 at 18:29
  • 1
    Maybe so, the line is not very clear for programming-related statistics questions (or vice versa). You can flag the question for moderator attention. DS SE is still in beta though. :) – A. Donda May 30 '15 at 23:44