Based on your description to me it seems a bit hackish and in some places even questionable.
Below are some of my comments:
1) Project the data on 3 principal components.
- Why 3? The information separating the classes might not be present within first 3 rotated axes.
2) Use k-means clustering on these components to select the biggest group and discard the others.
- Well, all the samples within this group will be similar to each other. If there are big differences between your classes you will end up retaining one class and removing the majority of the rest. In fact you will be removing samples that might be most informative for your classifier and retaining only the ones that cannot be separated.
3) Perform LDA on this "ball" of 3D points.
- The issue here is that LDA is designed to separate classes based on their normal distributions. Both classes share the same covariance matrix, but their means are different - this is the assumption LDA operates under. Imagine this was the case for your original data. Your procedures (especially k-means) would then group those two distributions under separate clusters and you would end up removing one of them. Then attempting LDA on one remaining, likely close to normal, distribution.
In short: it makes little sense to me. And it's hard to suggest anything without knowing how many samples and how many classes you have.
LDA and PCA are indeed often used together, but without the k-means performed in the middle. One use of PCA + LDA approach is for applying LDA in situations where there are more features than samples. You can search for term "Fisher faces" or "Eigen faces" for examples about applying it to face image classification.