I have a dataset that has many overlapping factors related groups of factors, and factors that we would like to investigate if there is a relationship. 57 measured factors total, in 50 individuals (lets ignore my low n...)
For instance,
1) Blood Pressure: systolic, diastolic, mean, & pulse pressure (height of BP wave)
2) Blood Flow: min, max, mean, & height for blood flow.
3) Size: Individual's weight, length, and circumference.
Each of these groups has substantial collinearity, as you may have guessed, since they are simultaneous measurements of related features. Blood flow is also related to blood pressure, and size always matters. Of course, I have less related features thrown in the mix also, including diet, organ weights, mother's weight...
I can certainly select the 'most meaningful' factor from each group, but I dislike ignoring the potential importance of the removed factors.
My goal is Multiple Linear Regressions for areas of interest.
My question is, can I (and how) be selective about my groupings in PCA? Looking at the component plot, the groups naturally cluster together - but in instances where they aren't as clustered as would be ideal, or a stray variable that isn't blood pressure happens to join the blood pressure cluster, what are my options? Do I have to add all variables to PCA, or can I choose those that I would like to group?
Additionally, can someone please explain why units matters for correlations? I was told if I change my g measurements to kg (thus changing 167 to 0.167), it would impact the coefficients/relationships found in MLR. That doesn't logic to me.
I hope this is specific enough! I see a lot of answers that have to account for a variety of interpretations of the question! Use of PCA analysis to select variables for a regression analysis this question gets close to mine, but I wasn't sure what was being asked.