0

More precisely, if I conduct a cmdscale (classical multidimensional scaling) on an Euclidean distance matrix by considering $n$ observations of $p$ variables i.e. $D_{ij}=\sqrt{ \sum_p (x_{ip} - x_{jp})^2}$ with $i$ and $j \in n$:

  1. Must the distributions of $x_p$ be normal (like in a PCA) ?
  2. And actually, if $p=1$, does it then mean something to conduct the PCoA and then do a clustering analysis?
Romain
  • 101
  • 1
  • Re 1: As far as I know, PCA does not require normality. – Richard Hardy May 04 '19 at 16:16
  • What is a "cmdscale"? – whuber May 04 '19 at 16:17
  • @RichardHardy Ok thx. So it means that I just have to find the appropriate transformation so that outliers do not outweight the PCoA, right ? – Romain May 04 '19 at 16:21
  • @RichardHardy Wait, sorry I just realized. I think a PCA supposes normality when working out the covariances, but for the PCoA I don't know... – Romain May 04 '19 at 16:23
  • 1
    I think PCA is a purely algebraic procedure with no assumptions, bar those that make PCA algebraically feasible. Assumptions like normality are needed for proving properties of statistical procedures, not algebraic ones. – Richard Hardy May 04 '19 at 17:15
  • @RichardHardy Hmm, I thought that since the PCA is working on variances with standardized data, it was originally made for normal distributions. But I guess that this is what you say differently. I can run the PCA but results will ne be necessarily useful if I'm too far from a normal distribution, so I should work with reasonably unskewed distributions right ? – Romain May 04 '19 at 17:26
  • No, this is not what I mean. Read the first comment on [this post](https://stats.stackexchange.com/questions/226845/properties-of-pca-for-dependent-observations), it sums my point up pretty well. – Richard Hardy May 04 '19 at 18:00

0 Answers0