0

I am new to principal component analysis (PCA). I performed PCA for a dataset with 54 samples. When I project them in 3D scatterplot, I can see samples with similar characteristics are grouped together separately. The X,Y and Z axes in 3d scatterplot represent PC#1, PC#2 and PC#3 respectively. Along the axes positive and negative values are represented.

What does these values convey, especially negative values ?

If a sample is found along an axis with negative value, what does that imply?

Also the overall variance all 3 PCs is 40% (PC#1-20%,PC#2-13% and PC#3=7%). What does that imply? Why it is not 80-90%? Is my data of good quality?

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Dinesh
  • 117
  • 2
  • 2
    You may find that the insight afforded by the answers to [Making sense of PCA...](http://stats.stackexchange.com/questions/2691) will help you. Some of the descriptions and images, especially in the highest-voted answers, really do answer your first two questions. As far as the third goes, there is no connection at all between data *quality* and variance "explained" by principal components, but there are things that can be said about your data based on the statistics you report. – whuber Aug 13 '13 at 22:05
  • @whuber So 40% of variance explains that there is only 40% of variance in my data? and the remaining 60% is similar?Please guide me. – Dinesh Aug 13 '13 at 22:13
  • Oh, no, nothing like that. It means that the reduction of dimensionality (reducing from a matrix of 54 samples described by p variables to a matriz describing 54 samples by only three axes [3 < p]) has been made by "ignoring" 60% of the original variability in the matrix (=you are only keeping 40% of the multivariate description, let's say). It has __nothing__ to do with data quality. It just says that the original variables were not correlated that much, so they can't be simplified in a few dimensions without throwing away 60% of the baby with the bathwater. – FairMiles Aug 21 '13 at 22:41

0 Answers0