Is it possible to use PCA twice, first on several subsets of data, and then again on the main components of those subsets?

Question

I am interested in understanding if it is possible to use PCA twice, first on several subsets of data, and then again on the main components of those data subsets. I'm not entirely sure if this will give me the answer I am looking for.

For example: I have water chemistry concentrations (Ca,Mg,Na,Si) from different streams located within different land types (agricultural, urban, woods, alpine). These different streams all flow to one main river. I'd like to find out how much one land type contributes to a river, given its chemistry data. This somewhat lies along a method called End Member Mixing Analysis.

I am wondering if it is possible to find the principal components within each data subset (each land type) using the chemistry data (Ca,Mg,Na,Si). Then use the resulting principal component that explains the most variance, as a variable to represent that land type. Ultimately, trying to find out how much one land type contributes to the chemistry of the downstream river.

Does this make sense? Is this the right method for the question, which land type is most responsible for the variation in river chemistry?

Well my original thought was that PCA would deduce the dimensionality of the subset data (from each land type), making it easier for me to represent that land type as a new variable in the second PCA that would be performed, that contained the principle component derived from each land type chemistry dataset — Syd26S, Apr 23 '15 at 14:59
If I used all of the chemistry data directly, I would lose track of the variance explained by a certain land type. — Syd26S, Apr 23 '15 at 15:00
I don't see why that would happen. This is just a 4D analog of a [ternary plot](http://en.wikipedia.org/wiki/Ternary_plot), sometimes termed a "tetrahedral plot." At each monitoring station the water quality plots as a point which is represented as a weighted average of the land type points--and there you have it. — whuber, Apr 23 '15 at 15:03
Okay that could work possibly! I suppose the problem is that I actually have about 11 chemical variables instead of 4. However I think you understand what I'm trying to do. I want to represent each land type with one type of point, then find the significance of each land type in the river chemistry — Syd26S, Apr 23 '15 at 15:24
11 variables is even better. In principle the set of all (positive) weighted averages will still span a three-dimensional simplex (a "tetrahedron"), because there are just four land types. Although you could characterize that nicely via PCA, you might not want to throw away the smaller principal components, because they provide information about the random variation of data around that tetrahedron--which occurs not just in the simplex but also extends into the surrounding 11-4 = 7 dimensions. The PCA loadings should estimate the coordinates in the tetrahedral diagram. — whuber, Apr 23 '15 at 15:30

Is it possible to use PCA twice, first on several subsets of data, and then again on the main components of those subsets?

0 Answers0