0

I am trying to model my dependent variable (ordinal - three levels) using a set of independent variables (5 ordinal and 10 numeric). I am using lrm function in "rms" package of R. I am conducting principle component regression. S1, C5, C2, C3, S7 and S4 are the selected independent variables using PCA.

          Coef         S.E.   Wald   Z    Pr(>|Z|)
          y>=2      -1.0469 0.6092 -1.72  0.0857  
          y>=3      -8.5826 1.0354 -8.29  <0.0001 
          S1=Simple -2.9091 0.6112 -4.76  <0.0001 
          C5         0.8389 0.1475  5.69  <0.0001 
          C2         1.4904 0.1889  7.89  <0.0001 
          C3         1.2139 0.1908  6.36  <0.0001 
          S7         0.8803 0.2701  3.26  0.0011  
          S4=TN     -1.2460 0.4659 -2.67  0.0075  

I understand, the output of the ordinal regression model is given by,

ln(Fij/ 1-Fij) = Boj + B1X1 + B2X2 + .....BkXk

where Fi1 is probability that Y=1, 
Fi2 is probability that Y=2, 
Fi3 is probability that Y=3
B0, B1.....Bk - coefficients
X0, X1.....Xk - Independent variables

My question is, how do we interpret negative coefficients here? Also, does ranking the values of Wald statistics from largest to smallest indicate descending strength of evidence of an association with the dependent variable?

mdewey
  • 16,541
  • 22
  • 30
  • 57
Shilpi
  • 21
  • 3
  • Perhaps http://stats.stackexchange.com/questions/38087/negative-coefficient-in-ordered-logistic-regression?rq=1 will help? – mdewey Apr 21 '16 at 11:09

1 Answers1

2

The purpose of incomplete principal components regression is to accomplish data reduction in an unbiased way by not using $Y$ in a disorganized way to select the model's independent variables. That means that the model must include principal components $1, 2, \dots, k$ where $k$ can be chosen using AIC. It is not appropriate to pick and choose which PCs to include in a purely stepwise way.

Also the output of the lrm function defines how to interpret the intercepts. You have misinterpreted them. They correspond to cumulative probabilities.

Once you take care of those problems you can interpret the signs of coefficients in the usual way. lrm states the model such that if $\beta$ is positive, increasing $X$ is associated with increasing $Y$.

You can run plot(anova(fit)) to get a rough idea of ranking of predictive discrimination. But the data are not capable of reliably telling you the ranking of variable importance, which would be exposed by bootstrap confidence intervals on importance ranks (type ?anova.rms for more information on this).

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322