I have been learning about PCA and SVD. And I know that to standardize the features before PCA is necessary. But I came across the book rating matrix, which makes me confused about what standardization do to the features.
So in a book rating matrix, a row is the rating a user gives to different books, and it can be ranging say from 0 to 5. So in this matrix, we know that all columns have the same unit measurement (1). However, for each column, the variance is not necessarily 1 (think about a book with ratings [1,4,5,1,1,1]). But it seems that because all columns/features have the same measurement, we do not need to standardize this matrix before PCA.
This makes me wonder that when we perform standardization on a dataset with features on different measurement(age
and income
for example), we are bringing the variance to 1 for ALL the features, my question is that will this make the features lose information?
Or more generally, should unit measurement
, or unit variance
, be the goal of data preprocessing?