1

I am attempting to conduct some multivariate analysis on a dataset I've been given with a sample size (n) of 23 and a feature number (p) of ~800. I would like to use dimensionality reduction, but after some reading I am unsure if PCA / FA methods are appropriate given that $N \le p$.

Running PCA on my dataset in MATLAB for example returns 22 principal components (n-1), in which most of the variance is stated to be accounted for within the first 10 or so components. I don't truly believe the PCs generated will be a good representation of my data though due to the low sample size: a feature ratio which is undesirable when running PCA.

Am I correct in thinking this or am I still okay to use PCA on my data? What might be an alternative dimensionality reduction technique I could implement that may be more suitable?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Only 22 PCs are possible, when you have N=23 data (see: [Why are there only nāˆ’1 principal components for n data if the number of dimensions is ≄n?](https://stats.stackexchange.com/q/123318/7290)). – gung - Reinstate Monica May 09 '19 at 15:04
  • 1
    I don't think PCA is inappropriate for datasets with a small samples size. PCA doesn't do any parametric *estimation* really, so sample size should not be an issue. (PCA just transforms the dataset such that all new dimensions are orthogonal to each other.) If you are concerned that the sample is not truly representative of the population (due to its size or any other other reason), then you have a bigger problem regarding the validity of the multivariate analysis you're planning to conduct on this dataset. – Vishal May 09 '19 at 19:03

0 Answers0