2

Possible Duplicate:
Replacement of NA values for PCA analysis

What is a good/proper way to cope with missing values in principal component analysis (PCA)? I have data with about 500,000 observations and due to missing values about 74,000 records are omitted from the analysis.

Ken
  • 570
  • 1
  • 5
  • 17
  • I can't imagine any realistic population that would produce 500K multivariate normal samples. There has to be some heterogeneity in your data, so PCA will suffer from unaccounted variability within and between groups. – StasK Sep 10 '12 at 16:29
  • 1
    The principal components can only be computed for the set of data that is included. How it would be affected by knowing the correct values for the missing data would depend on how the data became missing (i.e. MAR, MCAR or non-ingnorable). – Michael R. Chernick Sep 10 '12 at 16:39
  • 1
    there was similar [question](http://stats.stackexchange.com/questions/35561/replacement-of-na-values-for-pca-analysis) recently – ttnphns Sep 10 '12 at 18:30
  • What kind of analysis you will use for your data is irrelevant, first you must solve the problem about missing values. Then you must ask yourself question about "how the data came to be missing", and if necessary model the missingness process. Missing completely at random? Missing at random? With that solved, you could think about multiple imputations, for instance. – kjetil b halvorsen Sep 10 '12 at 18:30

0 Answers0