I'm trying to understand the process for statistical testing for principal component analysis or partial least squares.
Step 1. PCA: I feel that I have a not-terrible understanding of PCA: You find the ellipsoid described by the covariance matrix of the data, and then successively take the largest axis of variation (principal component 1), then the second largest (principal component 2), and so on. If the ellipsoid is long and stretched, then the variation is mostly along the first principal component (the eigenvector corresponding to the largest eigenvalue of the ellipsoid). If the ellipsoid is a planar "disc", then the variation in the data is explained well by two principal components, etc.
I also understand that after choosing to use (for example) only the first two principal components, then all of the data points can be plotted on a "Scores" plot that shows, for each data point $D^{(i)}$, the projection of $D^{(i)}$ into the plane spanned by the first two principal components. Likewise, for the "Loadings" plot (I think) you write the first and second principal components as linear combinations of the input variables and then for each variable, plot the coefficients that it contributes to the first and second principal components.
Step 2. PLS or PLS-DA: If there are labels on the data (let's say binary classes), then build a linear regression model to use the first and second principal components to discriminate class 0 (for data point $i$, that means $Y^{(i)}=0$) from class 1 (for data point $i$, that means $Y^{(i)}=1$) by first projecting all data to only lie in the plane spanned by the first and second principal components, and then regressing the projected input data $X_1', X_2'$ to $Y$. This regression could be written as (first step) the affine transformation (i.e. linear transformation + bias) that projects along $PC_1, PC_2$ (the first and second principal components), and then (second step) a second affine transformation that predicts $Y$ from $PC_1, PC_2$. Together these transformations $Y \approx Affine(Affine(X))$ can be written as a single affine transformation $Y \approx C (A X + B) + D = E X + F$.
Step 3. Testing variables from $X$ for significance in predicting the class $Y$: This is where I could use some help (unless I'm way off already, in which case tell me!). How do you test whether an input variable (i.e. a feature that has not yet been projected onto the principal components (hyper)plane), and decide if it has a statistically significant coefficient in the regression $Y \approx E X + F$? Qualitatively, a coefficient in $E$ that is further from zero (i.e. positive and negative values with large magnitude) indicates a larger contribution from that variable.
I remember seeing linear regression t-tests for normally distributed data (to test whether the coefficients were zero). Is this the standard approach? In that case, I would guess that ever variable from $X$ has been transformed to have a roughly normal distribution in Step 0 (i.e. before any of these other steps are performed).
Otherwise, I could see performing a permutation test (by running this entire procedure thousands of times and each time permuting $Y$ to shuffle the labels, and then comparing each single coefficient in $E$ from the un-shuffled analysis to the distribution of coefficients from shuffled analyses).
Can you help me see anywhere my intuition is failing? I've been trying to look through papers using similar procedures to see what they did, and as is often the case, they're clear as mud. I'm preparing a tutorial for some other researchers, and I want to do a good job.