I have a small dataset of size 60, with 6 features. I want to perform PCA on this data as a class exercise. To that end I've removed outliers indicated by Grubb's test, then imputed missing data using the MissMDA package. Last step before PCA I was going to check for normality in each of the features using Shapiro–Wilk since the same size is so small. (reduced to 58 after outlier removal).
When I check each feature using Shapiro–Wilk for normality I get a p value, > 0.05 I accept as normal, below that as non-normal. PCA requires normality so I assumed I needed to apply a log function to transform the data, or perhaps the R scale function.
But when I try either, and perform Shapiro–Wilk again the p value I get is < 0.05 . I don't suppose an expert here could give me any pointers please?
Version 1 : Base Test
shapiro.test( MyData[,2] )
Shapiro-Wilk normality test
data: MyData[, 2]
W = 0.76413, p-value = 2.822e-08
version 2 : Test scaling data
shapiro.test( scale( MyData[,2] ) )
Shapiro-Wilk normality test
data: scale(MyData[, 2])
W = 0.76413, p-value = 2.822e-08
Version 3 : Test log transformed data
shapiro.test( log( MyData[,2] ) )
Shapiro-Wilk normality test
data: log(MyData[, 2])
W = 0.93537, p-value = 0.004087
Is is necessary to perform a normalisation step to data before PCA, given, for example, that the prcomp function to calculate PCA in R has a CENTER and SCALE parameter? Is it enough just to set those each to TRUE and just calculate the PCA on the data that would have failed Shapiro–Wilk? Any tips would be greatly appreciated, thank for your time.