Often when PCA is performed on exam results where all variables (dimensions) have the same $0$ to $100$ scale, scaling is none the less applied. For different scales I can see the purpose of it, but not in this case. Why is it done?
-
This is covered in various answers to our very popular thread on this topic: http://stats.stackexchange.com/questions/53. Look through all well-upvoted answers to get a feeling of different opinions and use cases. – amoeba Apr 26 '16 at 15:12
-
@amoeba This question may be subtly different from the others: given that exam scores have a natural, common range, why would anyone ever standardize the scores? The accepted (and highly upvoted) answer to the proposed duplicate specifically suggests they would not. Other answers also suggest as much. – whuber Apr 26 '16 at 16:59
-
@whuber I agree that this question has its own spin (+1, by the way). I think the answers to the proposed duplicate *all together* (not only the accepted one) do provide some guidance and discussion of both options (standardizing or not). I am happy for this thread to stay open and alone, but I am afraid it will not get as thorough a discussion as already exists elsewhere... – amoeba Apr 26 '16 at 19:38
-
@whuber but what if there are topics that really 'discriminate' students so that the variance for these particular topics are much greater than the other topics ? It is the same potential range but woudn't you end up with a really distorted space ? Wouldn't scaling be advisable in that situation ? – Riff Apr 27 '16 at 06:58
-
@Nicolas I'm not sure what you mean by "topic" or "really distorted space." If your point is that different exam variables can have substantially different variances, then that would imply the fact they have natural limits of $0$ and $100$ is irrelevant--and likewise this question becomes irrelevant (or trivial). Many tests, though, are designed so that (at least in some reference population) the variances of each variable are all equal. – whuber Apr 27 '16 at 11:33
-
@whuber Ok so you do agree that scaling before PCA is not only a matter of measuring scale but also of variance magnitude between variables, if I'm getting everything correclty ? – Riff Apr 27 '16 at 11:35
-
@whuber What I meant by "topic" and "distorted space" is that if your variables are exams scores for Maths, Art, Sports, Bio. If The variance of two of those topics are not comparable with those of the other two then you would end up with a space primarily constructed out of the variables with greater variances (so distorted in the sense that it does not represent correctly the contributions of half of the dataset). – Riff Apr 27 '16 at 11:39
-
@Nicolas I am not sure what you mean by "measuring scale." Since PCA is all about variance--and covariance--then certainly you want to pay attention to any operations that could change those quantities (such as standardization of variables). The issues can get subtle: I am reminded of a related question at http://stats.stackexchange.com/a/50583/919. Your assumptions in your most recent comment may be unfounded: if those exam scores are used, say, with equal weights in developing a composite score, then arguably using the variables as-is would give the least "distortion." – whuber Apr 27 '16 at 11:40
-
@whuber I am using "measuring scale" as a synonym for "units of measure" so for exemple temperatures in Europe (somewhat stable all year round) and in the Sahara desert (great variance) are on the same measuring scale (°C, °K, °F, etc...) but have far different variances. – Riff Apr 27 '16 at 11:44
-
@whuber Sorry for multiposting I can't edit comments. I tried generating 4 variables with mean 50 and sd 50, 45, 20, 10 and when I compare the two PCA results I find that contributions for third and fourth variables are negligible on the two first dimensions when not scaling while being more evenly allocated when scaling. That's why I'm thinking of the first PCA as being distorted, it almost only represents two of the variables. (frustrating that I can't add code and pics to comments) – Riff Apr 27 '16 at 12:06
-
@Nicolas We should be astonished if you obtained any other result. If you read over our posts on PCA, especially comparing PCA on covariances to those with correlations, you will find these phenomena acknowledged, described, and explained. Whether one or the other result is "distorted," though, depends on what these data mean and why you are conducting the PCA in the first place. – whuber Apr 27 '16 at 13:39
-
@whuber Ok so poor choice of words but the idea's there. Thanks ! – Riff Apr 27 '16 at 13:43
1 Answers
Scaling (for PCA) is kind of a personal matter. Some people always do it, others won't, whatever the data.
It is not just a question of measuring scale (°C vs °K, km vs miles), not scaling leads to giving more importance to variables that have larger variance (such variables would contribute more to dimensions construction than other variables). Some people exactly want that as they reckon that variables with small variance are of little interest. On the other hand, people that do scale their variables often state that all variables are of equal interest: a variable with small variance may even be more interesting - for exemple in sensometrics, a difficult item (umami for european people) to evaluate and thus be a key item to separate your products while other items (sweet taste) will have larger variance as it is easily recognized and people give notes on a much more individual level-
When you have variables on different scales, for exemple $m$ and $km$, the difference of scale often leads to difference in variance that may not be relevant (just an artificial product of scale) but scaling in a PCA is much more than just getting rid of that problem.

- 423
- 3
- 15
-
OK Nicolas, I thought about your points. Whilst I am not a statistician, but did study strd dev and variance at uni many years ago, reading your answer makes me think: – thebluephantom Apr 27 '16 at 16:06
-
1) whilst I can see that maths is more difficult than econ imho, one can argue that it depends on the individual's abilities, that is not taken into account I think. So, if I just want to cluster on 0 .. 100 regardless of any interpretation of the exam subject, then reading your response and the fact that I want to consider all results as equal makes me think I would NOT want to scale. But may be I see that entirely wrong. – thebluephantom Apr 27 '16 at 16:17
-
2) I am getting the impression that in machine learning all the algorithms need normalization except decision trees, frequent item sets. 3) I have also read that statisticians do not like skewed data. Seems a little akin here. I would think I would like to know if the data is skewed. It's interesting. – thebluephantom Apr 27 '16 at 16:17