Is it correct to standardise (z-score) features within samples before PCA?

Question

Given a data set where we have different measured features in the same units for each subject. For example, numbers of different cell types (features) in a tumour (subject), where we have n tumours and m features.

If we want to see which cell types (features) explain most of the variation across tumours (subjects), is it correct to z-score the values of the features within subjects (i.e. have each subject distribution of values centred around 0)?

Thanks.

Yes, if you need to remove level and scale differences between the profiles (subjects) you may do that. You may then perform PCA of features (usual way, or R-way), or PCA of the transposed data, of subjects ([Q-way](https://stats.stackexchange.com/a/20103/3277)). — ttnphns, Nov 14 '18 at 10:19

score 1 · Answer 1 · answered Nov 14 '18 at 10:57

It's reasonable choice but it doesn't need to be only z-score.

PCA requires each column to have zero mean in order for the algorithm to find a correct first principal component. See this answer for nice visualization and more details
https://stats.stackexchange.com/a/22331/226852

There are other scaling methods that allow your data to have zero mean as well. Check this out
https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html

Is it correct to standardise (z-score) features within samples before PCA?

1 Answers1