0

I need to do PCA and Factor analysis. All of the authors of such study used normally distributed dataset for their study. But may Dataset of 173 samples with 17 variables show non-normal distribution.

However, I have applied each of the square, square root, and Log transformation methods for the whole dataset but they do not show any reflection. Again, I applied these methods for each variable separately. But even after that most of the variables show non-normal distribution. Obviously, the data set has a bigger variation in the minimum and maximum values.

Can anyone suggest what should I do in such cases? How can I normalize the dataset? Or is there any other way to do this?

Thank you in advance.

  • Rather than trying to normalize the variables, I'd be more concerned with why they are non-normal in your sample when they are normal in the others. But, for an answer see [this thread](https://stats.stackexchange.com/questions/32105/pca-of-non-gaussian-data) – Peter Flom May 08 '17 at 11:15

1 Answers1

0

When data is not Normally distributed, people often use ICA for factor analysis. The set up works roughly as follows:

Assume $\bf Y$ is your matrix (173 $\times$ 17) of uncorrelated, non-Gaussian data. You want to find some factorization:

$${\bf Y} = {\bf ZA}^\text{T} + \bf E$$

Where ${\bf E} \sim {\bf N}_{n \times p}(0, \Psi)$ is Gaussian noise, and $\bf Z$ are your factors and $\bf A$ is the mixing matrix. Then each row of your data is:

$${\bf y} = {\bf Az}$$

By the central limit theorem, the elements of $\bf y$ are "more normally distributed" than the elements of $\bf z$, because of $\bf A$. This is especially true if you have many variables, but works even when you only have 17.

ilanman
  • 4,503
  • 1
  • 22
  • 46