2

I am running an analysis on a Cox model where I used PCA features extracted from medical images to predict survival. However, when I examine the coefficients on these features they are very small. (Implemented with Lifelines package in python)

   coef exp(coef) z p -log2(p)

0 0.01 1.01 3.41 <0.005 10.59

1 0.02 1.02 7.09 <0.005 39.42

Grade_II    -0.39   0.68    -2.97   <0.005  8.4
Grade_III   -0.27   0.77    -2.1    0.04    4.82

These 2 PCA features have large z-scores so I expect they should have large coefficients. I think it may have something to do with the fact that the features are standardized before running PCA, so 1.01 is actually a large change. I want to be able to just look at the coefficients and be able to tell how these PCA features compare to the categorical variables Grade_II and Grade_III. Looking at the coefficients it seems like the PCA features have very little effect compared to grade but in fact they actually have a large effect. I want to be able to show a large coefficient for readability purposes, but I am not sure what the correct thing to do is. Do I "unstandardize" the PCA features right before running the Cox regression? Do I divide them all by the standard deviation? I don't think I can just divide by an abritary number because then it would be hard to compare the coefficients.

Mattreex
  • 83
  • 3
  • Apriori there is no reason to assume that a PCA's output will create a more accurate regression, in fact it could hurt: https://stats.stackexchange.com/questions/52773/what-can-cause-pca-to-worsen-results-of-a-classifier – Cam.Davidson.Pilon May 14 '20 at 14:17

0 Answers0