I remember reading a paper a while ago that demonstrated some cases in which PCA would fail to capture important features of a data set in the first few principal components, but where those features would be reproduced in lower-variance components.
I think someone here recently mentioned the paper in a comment, and it jogged my memory.
I've tried doing a search on Google, Google Scholar, and my library database, but I haven't found anything. Coming up with the right search terms for something like this is not easy.
What paper is this?