1

I'm trying to calculate change in scores on a depression questionnaire - a very simple problem. However, what I care about is not the change in raw score, but rather the change in principal component scores for each subject. My pipeline is as follows:

  1. Conduct a PCA using the pre-treatment scores for each subject
  2. Calculate pre-treatment scores for each subject for PC1 through PC4
  3. Use the loadings for PC1-4 calculated in part 1 to calculate post-treatment scores for each subject for PC1-4
  4. Compute the difference between pre- and post-treatment scores for each subject

However, because PC scores are scaled, the post-treatment scores are no longer scaled and centered because they are calculated using the PC loadings from the pre-treatment data but the actual data from the post-treatment data. Is this kosher?

A follow-up question would be, is there a better way to calculate change in principal component scores between time points? Could I calculate the factor loadings using all data (pre- and post-treatment) and then calculate pre- and post-treatment scores for PC1-4 that way? Intuitively that seems wrong.

Any suggestions would be much appreciated!

  • `is there a better way`? Nobody is to advice. Your decision should mirror what makes sense _to you_ having your tasks at hand. Your current decision - to obtain PC structure from the pre-treatment data and then impose it over to the post-treatment data - sounds not unreasonable. – ttnphns Apr 13 '16 at 08:10
  • What you describe in pts 1-3 is simply "do PCA on the dataset and then compute PC scores both for its data points and `new` data points". Many PCA programs will allow you to do it in one action: you enter both datasets in the PCA but indicate the second one as "passive". Alternatively, you can compute PC scores for the "new" dataset points yourself ([see](http://stats.stackexchange.com/q/126885/3277) how to compute PC scores). Think you, how it is better for you to center/standardize that second dataset points. Usually people center/standardize it by the mean/st.dev. of the _first_ dataset. – ttnphns Apr 13 '16 at 08:20
  • Perhaps that longer examples are interesting for you. We discussed the rehabilitation potential for old-aged people after a stroke, and one discussion was to look at ADL and look, whether age, sex, and some type of stroke made the rehabilitation different, and also the structure of the ADL over the time before and after the stroke. I experimented with my self-written PCA/FA-program and documented a set of rotations. The key for this is, to *partial out* the anamnestic ADL-variance and look at the remaining as "variance-in-change". See http://go.helms-net.de/stat/pca/tab2ver2_word_2.html – Gottfried Helms Apr 16 '16 at 08:34
  • Of course such analyses can be done using factor scores and regression in the usual way - I just want to hint to analysis of factor/component structures in partial variance. – Gottfried Helms Apr 16 '16 at 08:56

0 Answers0