Usage of PCA - how to scale observations?

Question

I want to use PCA in this kind of situation. I have three variables:

how many times something happened for user - positive integer;
total "power" of all happened events for user - real number, can be negative
percent of "successful hits" - real positive number 0 < x < 1

Wikipedia states that "PCA is sensitive to the scaling of the variables."

A problem is that "power" can be measured using various units. And choice of units will affect the results. I do not see a natural choice of units for the moment.

Are there any suggestions on how to scale observations for PCA?

Did you read some advice for a [similar question](http://stats.stackexchange.com/q/12200/3277)? — ttnphns, Feb 26 '13 at 15:35

score 1 · Answer 1 · answered Feb 26 '13 at 12:21

1

You could first shift the data by substracting the respective mean values to each of the columns, and then rescale the resulting values so that they fall within the interval [-1,1]

answered Feb 26 '13 at 12:21

jpmuc

12,986
1
34
64

Thank you for the answer ! Yes, mean subtraction is necessary, but what is appropriate scaling - should I scale everything to -1 1 or only part of variables - it is not so clear... Choosing different scaling for different variables - gives different "importance" for different variables - it should depend on task what is the appropriate "importance", but I do not see the right one in my case... – Alexander Chervov Feb 26 '13 at 12:39
5

Although this recommendation might work for some datasets, it is exquisitely sensitive to any outliers that might exist, and so is not a good general procedure. – whuber Feb 26 '13 at 13:41
very true! otherwise you could normalize the variance of each component individually to one – jpmuc Mar 01 '13 at 16:01
@AlexanderChervov I meant it as: $$x -> (x-mean(x))/(max(x)-min(x))$$, so that the variable now lies in the interval $$[-1,1]$$ – jpmuc Mar 01 '13 at 16:03

Usage of PCA - how to scale observations?

1 Answers1