2

I remember reading a paper a while ago that demonstrated some cases in which PCA would fail to capture important features of a data set in the first few principal components, but where those features would be reproduced in lower-variance components.

I think someone here recently mentioned the paper in a comment, and it jogged my memory.

I've tried doing a search on Google, Google Scholar, and my library database, but I haven't found anything. Coming up with the right search terms for something like this is not easy.

What paper is this?

shadowtalker
  • 11,395
  • 3
  • 49
  • 109
  • 1
    One mentioned [here](http://stats.stackexchange.com/questions/87198/) or [here](http://stats.stackexchange.com/questions/101485)? Jolliffe (2010), *Principal components analysis*, deals with this topic & may give more references. – Scortchi - Reinstate Monica Apr 15 '15 at 08:43
  • 1
    Here is another relevant question on this site when using PCA as a data reduction before regression, [*Principal component regression analysis using SPSS*](http://stats.stackexchange.com/q/104991/1036). In the comments to my answer I list several references (that are redundant with some of the ones Nick Stauner mentions). – Andy W Apr 15 '15 at 11:45
  • @Scortchi yes it was the second question you linked to. Post that as an answer – shadowtalker Apr 15 '15 at 12:23
  • Great references in the other questions as well. PCA is a hidden specialty here – shadowtalker Apr 15 '15 at 12:24
  • @ssdecontrol: Good. I was thinking to mark this as a duplicate rather than post a link-only answer (I've nothing to add to it). – Scortchi - Reinstate Monica Apr 15 '15 at 12:35
  • @Scortchi I don't see a problem with a terse answer if the answer is complete and correct – shadowtalker Apr 15 '15 at 12:36
  • 1
    @ssdecontrol: On reflection there's little difference between looking for "examples" & "references", so I added the `feature selection` tag & marked it as a duplicate - I think the wording in your question is nice & it'll be a useful pointer to Nick's answer (& the others). – Scortchi - Reinstate Monica Apr 15 '15 at 12:46
  • 1
    See also [this recent question](http://stats.stackexchange.com/questions/141864) where I tried to provide an answer that would serve as a bit of an overview of several CV threads on this topic, including ones mentioned by @Scortchi. – amoeba Apr 15 '15 at 15:05
  • 1
    By the way, you refer to the low-variance components as "higher" ones in the title and as "lower" ones in the first paragraph :) I find this confusing. – amoeba Apr 15 '15 at 15:07
  • @amoeba good point. I meant "lower/higher" as in "lower/higher" _index_, in that the "first" principal component is the one with the highest variance. – shadowtalker Apr 15 '15 at 16:02
  • 1
    @Scortchi I'm not sure I agree in principle, but for the purpose of actually helping users find the right information I'm fine with that. Principles are overrated anyway. – shadowtalker Apr 15 '15 at 16:04

0 Answers0