103

I'm doing principal component analysis on my dataset and my professor told me that I should normalize the data before doing the analysis. Why?

  • What would happen If I did PCA without normalization?
  • Why do we normalize data in general?
  • Could someone give clear and intuitive example which would demonstrate the consequences of not normalizing the data before analysis?
amoeba
  • 93,463
  • 28
  • 275
  • 317
jjepsuomi
  • 5,207
  • 11
  • 34
  • 47
  • 26
    If some variables have a large variance and some small, PCA (maximizing variance) will load on the large variances. For example if you change one variable from km to cm (increasing its variance), it may go from having little impact to dominating the first principle component. If you want your PCA to be independent of such rescaling, standardizing the variables will do that. On the other hand, if the specific scale of your variables matters (in that you want your PCA to be in that scale), maybe you don't want to standardize. – Glen_b Sep 04 '13 at 09:20
  • 4
    Watch out: normalize in statistics sometimes carries the meaning of transform to be closer to a normal or Gaussian distribution. As @Glen_b exemplifies, it is better to talk of standardizing when what is meant is scaling by (value - mean)/SD (or some other _specified_ standardization). – Nick Cox Sep 04 '13 at 09:37
  • 7
    Ouch, that 'principle' instead of 'principal' in my comment up there is going to drive me crazy every time I look at it. – Glen_b Sep 04 '13 at 09:59
  • 13
    @Glen_b In principle, you do know how to spell it. Getting it right all the time is the principal difficulty. – Nick Cox Sep 04 '13 at 10:07
  • 1
    These are multiple questions so there is no one exact duplicate, but every one of them is extensively and well discussed elsewhere on this site. A good search to begin with is on [pca correl* covariance](http://stats.stackexchange.com/search?q=[pca]+correl*+covariance). – whuber Sep 04 '13 at 17:50
  • @NickCox The generally accepted definition of normalise is to transform a random variable to one with zero means and unit standard deviation. This is also what Google gives when you search "define normalise". Therefore it is not better to use a different word for the same thing. – Robino Nov 13 '16 at 11:15
  • @Robino I agree with your conclusion but I disagree with your assertion. The problem is that there is not a generally accepted meaning across statistics and machine learning. Normalise is used with the sense I mention and with other senses too, e.g. scaling to within [0, 1]. – Nick Cox Nov 13 '16 at 15:15
  • @NickCox Should I use mean normalization by using x-mean/std. or just use feature scaling before applying pca.I am applying pca to images whose pixel values varies from 0-255 . – Boris May 20 '18 at 10:23
  • @Boris I can't possibly advise remotely on what is best for you beyond pointing that (x $-$ mean) / SD is one method possible and certainly not x $-$ mean/SD. If all your variables are in [0, 255] it's conceivable that not scaling at all makes as much sense as any other approach. – Nick Cox May 20 '18 at 17:03
  • @NickCox means it doesn't matter – Boris May 20 '18 at 18:05
  • Not what I meant. Not knowing which method is best for your data and your project doesn't mean that I am implying that choice of method doesn't matter. – Nick Cox May 20 '18 at 18:49
  • @whuber: You get 0 hits with your search. – MSIS Aug 08 '19 at 21:24
  • 1
    @MSIS Thank you. Somehow the system eliminated the wild card "*" after "correl". I have re-inserted it and hope it stays there this time! It now returns 316 results. – whuber Aug 09 '19 at 12:01

2 Answers2

96

Normalization is important in PCA since it is a variance maximizing exercise. It projects your original data onto directions which maximize the variance. The first plot below shows the amount of total variance explained in the different principal components wher we have not normalized the data. As you can see, it seems like component one explains most of the variance in the data.

Without normalization

If you look at the second picture, we have normalized the data first. Here it is clear that the other components contribute as well. The reason for this is because PCA seeks to maximize the variance of each component. And since the covariance matrix of this particular dataset is:

             Murder   Assault   UrbanPop      Rape
Murder    18.970465  291.0624   4.386204  22.99141
Assault  291.062367 6945.1657 312.275102 519.26906
UrbanPop   4.386204  312.2751 209.518776  55.76808
Rape      22.991412  519.2691  55.768082  87.72916

From this structure, the PCA will select to project as much as possible in the direction of Assault since that variance is much greater. So for finding features usable for any kind of model, a PCA without normalization would perform worse than one with normalization.

With normalization

JoeyC
  • 109
  • 5
Dr. Mike
  • 1,526
  • 11
  • 10
  • 12
    You explain standardizing not normalization but anyway good staff here :) – erogol Nov 08 '14 at 19:09
  • @Erogol that is true. – Dr. Mike Nov 18 '14 at 22:09
  • 3
    Great post! Perfectly reproduceable with skelarn. BTW, USArrests dataset can be downloaded from here https://vincentarelbundock.github.io/Rdatasets/datasets.html – dohmatob Apr 27 '17 at 12:23
  • Just curious: How come the autocorrelations in your data are not 1 ? – gary Oct 31 '18 at 20:48
  • 1
    @gary this is a covariance matrix, not a correlation matrix, therefore the diagonal elements are not necessarily equal to 1. – Arnaud A Aug 07 '19 at 19:45
  • For some reason normalization and standardization are used quite interchangeably. Good explanation. Short story, just subtract the mean and we're good to! – 3nomis Dec 08 '21 at 10:11
21

The term normalization is used in many contexts, with distinct, but related, meanings. Basically, normalizing means transforming so as to render normal. When data are seen as vectors, normalizing means transforming the vector so that it has unit norm. When data are though of as random variables, normalizing means transforming to normal distribution. When the data are hypothesized to be normal, normalizing means transforming to unit variance.

David H
  • 311
  • 1
  • 2