Using PCA, clustering, and LDA together

Question

After reading about both algorithms (Principal Component analysis and Linear Discriminant analysis), I started using them combined in a way which appeared intuitive to me.

I have a data set that I project in 3D using PCA, then I cluster the projected data (e.g. using k-means clustering) and take the biggest cluster as my valid data set and the rest is considered as outliers. Then I use LDA to project my original valid data (not the PCA-projected one) into a space where the separation between classes is maximized. This model is then used to classify any new Input data later. I might also need to keep the PCA model to filter the new input data as well but this is another issue.

My question is: Is it correct to use these algorithms in this way? Or would you suggest a different approach?

score 1 · Answer 1 · answered May 20 '20 at 13:49

Based on your description to me it seems a bit hackish and in some places even questionable.
Below are some of my comments:

1) Project the data on 3 principal components.

Why 3? The information separating the classes might not be present within first 3 rotated axes.

2) Use k-means clustering on these components to select the biggest group and discard the others.

Well, all the samples within this group will be similar to each other. If there are big differences between your classes you will end up retaining one class and removing the majority of the rest. In fact you will be removing samples that might be most informative for your classifier and retaining only the ones that cannot be separated.

3) Perform LDA on this "ball" of 3D points.

The issue here is that LDA is designed to separate classes based on their normal distributions. Both classes share the same covariance matrix, but their means are different - this is the assumption LDA operates under. Imagine this was the case for your original data. Your procedures (especially k-means) would then group those two distributions under separate clusters and you would end up removing one of them. Then attempting LDA on one remaining, likely close to normal, distribution.

In short: it makes little sense to me. And it's hard to suggest anything without knowing how many samples and how many classes you have.

LDA and PCA are indeed often used together, but without the k-means performed in the middle. One use of PCA + LDA approach is for applying LDA in situations where there are more features than samples. You can search for term "Fisher faces" or "Eigen faces" for examples about applying it to face image classification.

score 0 · Answer 2 · answered Nov 10 '15 at 18:27

0

Assuming you take only the principal components of your original data, the separation by LDA on that set is practically fine.

It appears to me that you aim at outlier detection. For that purpose, a one-class support vector machine may be suitable on your dataset.

answered Nov 10 '15 at 18:27

dthettich

46
3

The separation by LDA is done on the original high dimensional data set. I just use PCA as a tool to project my data samples in the space maximizing the variance, thus leading to the fact that very similar samples will form a cluster and outliers will be projected far away from this cluster (my intuition thinking). And then check which high dimensional samples land in the cluster when projected and use them for further processing with LDA. – Mehdi Nov 10 '15 at 18:32
From my understanding, this boils down to a mathematical problem. If you apply it the way described, there should be some sort of transformation of the two methods into each other. I'm interested in the maths behind it, but unfortunately not that firm to do a statement on that idea. What I can do is direct you to PCA as a feature selection step for subsequent classification by LDA (or SVM etc.). Here is a [discussion](https://stats.stackexchange.com/questions/27300/using-principal-component-analysis-pca-for-feature-selection) – dthettich Nov 11 '15 at 07:39

Using PCA, clustering, and LDA together

2 Answers2