0

In PCA, when I extract the principle component vectors, I am choosing the first vector with the largest corresponding eigenvalue. I notice that some of the values in this vector are close to zero. Can I reduce the size of this vector by excluding/removing the dimensions where the entries are close to zero?

My line of thinking is that if a component 'explains', for example 80% of, the variance and is a unit vector of 100 elements, and in this unit vector only a single loading (value in the vector) contributes as much as 90% of the total value making up a size of 1.

I would like to use PCA in a way similar to Lasso regression where the constraint on the total size of used variables is being minimized by some tuned parameter.

Vass
  • 1,425
  • 2
  • 14
  • 20
  • 3
    You are asking about using PCA for feature selection. Yes, it can be used, and people do use it, exactly along the lines you are thinking about. However, the consensus usually is that it is a pretty bad approach, with **Lasso being much stronger recommended** to use instead. Lasso chooses features by looking at how they explain the dependent variable in the regression. Whereas PCA knows nothing about the dependent variable. – amoeba Feb 26 '15 at 15:27
  • @amoeba, so in a way PCA and Lasso, do two different things? Since Lasso considers a target variable, and PCA looks only for max variance projection directions, they perform different tasks? – Vass Feb 26 '15 at 15:54
  • Yes, that's correct. In some situations PCA feature selection can apparently produce reasonable results, but if you are doing regression, just use Lasso. – amoeba Feb 26 '15 at 16:24
  • @amoeba, and PCA does lower dimensional projections, good for clustering? (if not regression) and can you do regression on the principle component projections? – Vass Feb 26 '15 at 16:28
  • 1
    Yes, but why would you want feature selection if you do clustering? To the second question: yes, it's called principal component regression, but it's better to use ridge penalty instead. – amoeba Feb 26 '15 at 16:30
  • 3
    See e.g. [here](http://stats.stackexchange.com/questions/101485): there's no *guarantee* that components accounting for only a little of the variability between predictors explain only little of the variability of the response. It's a context-dependent *assumption*. – Scortchi - Reinstate Monica Feb 26 '15 at 16:35
  • @Scortchi, but in most cases is it not likely? – Vass Feb 26 '15 at 16:58
  • @Vass: Define your population of cases. – Scortchi - Reinstate Monica Feb 26 '15 at 17:06
  • @Scortchi, it is words from groups of users in Twitter. They can produce different 'corpus' of words. And I want to pull out the words that define these 'groups' – Vass Feb 26 '15 at 17:31

0 Answers0