0

I know that Principal Component Analysis (PCA) is the eigenvector of the covariance matrix. It is used as a tool for dimensional reduction. What I am confused about is whether the PCA give weights to original features in order to find out which features explain the data the most or does it come up with new set of abstract features that explain the greatest variance in the data set.

thethakuri
  • 139
  • 2
  • Please check the existing threads. There are some truly excellent answers (e.g. by @amoeba) to almost everything you need to know about PCA. – Richard Hardy Aug 19 '16 at 09:38
  • @amoeba I did go through your post but I am not convinced if that answers my question. – thethakuri Aug 19 '16 at 10:22
  • 1
    It looks to me that the thread over there and the question posed here really amount to the same thing. Perhaps you should edit this question to make it more clear, what aspect of the other thread was unclear to you? (I'm guessing that you might want a bit more about the "weights to original feature" aspect, but that's only a supposition - as the question is currently phrased it is hard to see that it can be given an answer that would not also fit at the proposed duplicate.) – Silverfish Aug 19 '16 at 12:48
  • It's not only "my post", thethakuri; there are 26 answers in that thread and many of them are very good. If they do not answer your question that's fine, but please make sure to look through that thread and then edit your question to make it clear what aspect you do not understand, as @Silverfish suggested above. – amoeba Aug 19 '16 at 13:14
  • @amoeba You pointed out that "PCA is not selecting some characteristics and discarding the others. Instead, it constructs some new characteristics that turn out to summarize our list of wines well." This suggests PCA is creating abstract features that best define the data. However, according to JD Long "in a situation where you have a WHOLE BUNCH of independent variables, PCA helps you figure out which ones matter the most" i.e. PCA simply assigns weights to existing features. So which one is it ? – thethakuri Aug 21 '16 at 08:13
  • Clarification question: when you ask if PCA "gives weights" to original features, you mean if it assigns "importance" to original features? – amoeba Aug 21 '16 at 09:23
  • PCA constructs new features as *linear combinations* of old features. If your old features are x and y then PCA will construct PC1 that might look like 0.6*x + 0.8*y. Would you call it "creating new feature" or "giving weight to old ones"? – amoeba Aug 21 '16 at 09:25
  • 1
    Okay. That answers my question. – thethakuri Aug 21 '16 at 09:26
  • I am still wondering: Would you call it "creating new feature" or "giving weight to old ones"? Just a clarification on what you meant by "giving weights". – amoeba Aug 21 '16 at 09:28
  • I would say that it is giving weights to original features as, from your example, variable y is clearly explaining more variations than x. I come from computer science background with great interest in machine learning. I have used PCA as a tool for dimensional reduction. I don't have any extensive background on statistics other than some basic courses I took online. So, I am just trying to make sense of it all. Thanks for your input ! – thethakuri Aug 21 '16 at 09:45
  • @thethakuri I edited my answer in the linked thread to clarify this issue, and also edited the confusing (conflicting) sentence in the second answer. Thanks for pointing this out. – amoeba Aug 22 '16 at 14:20
  • @amoeba So we should use these components in whole and should not use the independent weights as decision factors ? – thethakuri Aug 26 '16 at 08:44
  • Yes. If you need to pick up some individual features, there are other techniques for that. PCA is not a feature selection technique. – amoeba Aug 26 '16 at 08:51

0 Answers0