Meaningful inference about data structure based on components with low variance in PCA

Question

A lot of microbiome (microbial ecology) papers that I have come across use either principal component analysis (PCA) or principal coordinate analysis (PCoA) to make conclusions about the data. A lot of these claims are based upon components/coordinates with low variance, or by using the higher components to show patterns that aren't visible in the lower principle components. Although I've found a few questions here that cover the interpretation of PCA/PCoA plots, I haven't found any discussion of whether or not meaningful inference can be made based on components that explain lower variance.

The plot below is from a paper that compared the gut bacteria in pregnant women at different times. Each of the points represents a bacterial community. It does seem like the T1 samples are clustered together on the left of the figure, but is this meaningful when the component variances are only 8.9% and 4.5% respectively?

enter image description here

My second question is if it makes sense to make inferences based on patterns visible in the higher components, when these patterns aren't visible in the lower ones.

A good example of this is from the Human Microbiome coursera course. The plot below shows how the bacterial communities cluster from different body parts. In this example, the vaginal communities (shown in purple) cluster with those from the skin (shown in green).

enter image description here

However, this community seems to cluster by itself when you only look at principal components four through six. Is it an acceptable practice continue looking at the other components, when you aren't getting separate clusters in first few? To me this feels like you are fishing for the results that you want to see.

enter image description here

I would greatly appreciate any insight about this topic! Here are the related topics that I found here that didn't quite answer my question:

Sources:

+1 to @ttnphns's answer, but I would like to comment about the specific images you posted. (1) In the first case you are you concerned that 9% is not a lot, but this is the *first* PC! No other projection of the data can possibly have more variance. So for this dataset it is as "variable" a projection as it gets; one can certainly attempt to interpret it. (2-3) In principle, yes, one might be fishing [in the noise] for some class-separability by looking at further and further PCs. But the 3rd figure really does not look like noise! I would bet that this class separation is real. — amoeba, Nov 18 '14 at 10:09

score 6 · Accepted Answer · edited Apr 13 '17 at 12:44

This sort of question did appear several times on CV (you have to browse through PCA clustering questions). The short answer to your question is yes, it makes sense inspecting junior dimensions in search for a structure (such as clusters) in your data. But why not? Often senior components explaining the lion's share of the variance are irrelevant to the currently important distinctions in the data. I might cut a loaf of bread lengthwise; then the 1st PC of that ellipsoid won't show the two halves, but PC2 or PC3 is likely to show it - the bimodality.

One should remember that dimensionality reduction methods (such as PCA, PCoA) are not intended to find clusters or to map classes the best way. They do not replace cluster analysis or discriminant analysis, therefore. With PCA or alike techniques, you only can hope that some dimensions will uncover the structure for you.

Just one example. Here is two scatterplots of the same 2-class data. One shows the first PC drawn on it, the other shows the discriminant function drawn. Neither PC1 or the remaining, orthogonal to it, PC2, alone, isn't quite bimodal. Discriminant is much better in that respect, because it was extracted for the purpose to capture the difference between the two classes.

Analytically logical pass to uncover-then-plot structure would be to perform cluster analysis (or latent class analysis) to form classes, then to use discriminant analysis (or, perhaps, multidimensional INDSCAL scaling) to plot those. However, discriminant analysis (DA) results are, naturally, dependent on the classes. PCA/PCoA results are not - since they are unsupervised and are blind to the nonhomogeneity in the data. But that is exactly the reason (or at least one of) why many people would prefer to attempt PCA instead of DA in order to visualize class distinctions.

You say, To me this feels like you are fishing for the results that you want to see. This apprehension would be relevant in the context of multiple statistical significance testing and not in the present context of exploratory data analysis. Yes, EDA is "fishing" for revelations that might look good to you, it's what it is about. On the other hand, if you prefer to think of junior dimensions of the data as noise (rather than weak but substantive ones) dimensions, then indeed the "fishing" claim is appropriate. PCA itself does not separate signal from noise. One has to analyze dimensions statistically if they theoretically resemble noise or signal, but that implies assumptions about the data; so greet the vicious circle. But, fortunately, with a sufficiently large sample size, noise dimensions are likely to dither real class differences, not to fake them.

Meaningful inference about data structure based on components with low variance in PCA

1 Answers1

Linked