1

I have a dataset with 37 independent variables and a dependent variable. In order to take care of multicollinearity among the independent variables, I conducted a PCA on them. My first principal component explains 53% of the variance in the data.

However, when I run a linear regression using top 10 PCs (explaining ~90% of the variance) as independent variables:

lm(dependent ~ PC1 + PC2 + ...))

the regression coefficient for PC1 comes to be statistically insignificant (all other 9 PCs are highly significant). I would have imagined that PC1 would have been highly significant.

Am I conceptually missing something?

amoeba
  • 93,463
  • 28
  • 275
  • 317
  • No, you are not missing. There is no guarantee that the scores for your first PC are statistically significant with another variable (eg. your independent variable in this case.) – usεr11852 Oct 07 '15 at 22:03
  • @amoeba, I used the first 10 PCs in my regression model as together they explain about 90% of the variance in the data. PC2:PC9 came out to be highly significant. I am confused because I was expecting PC1 to be signifcant. – States.the.Obvious Oct 07 '15 at 22:06
  • 1
    Thanks for the clarification. Please see my answer in this thread http://stats.stackexchange.com/questions/141864/ (and many linked answers there if you want to go further). Perhaps this question can even be closed as a duplicate of that one. Let me know if that discussion does not fully address your concerns. – amoeba Oct 07 '15 at 22:12

0 Answers0