I tried to plot the results of an ordered logistic regression analysis by calculating the probabilities of endorsing every answer category of the dependent variable (6-point Likert scale, ranging from "1" to "6"). However, I've received strange probabilities when I calculated the probabilities based on this formula: $\rm{Pr}(y_i \le k|X_i) = \rm{logit}^{-1}(X_i\beta)$.
Below you see how exactly I tried to calculate the probabilities and plot the results of the ordered logistic regression model (m2
) that I fitted using the polr
function (MASS
package). The probabilities (probLALR
) that I calculated and used to plot an "expected mean score" are puzzling as the expcected mean score in the plot increases along the RIV.st continuum while the coefficient for RIV.st
is negative (-0.1636). I would have expected that the expected mean score decreases due to the negative main effect of RIV.st
and the irrelevance of the interaction terms for the low admiration and low rivalry condition (LALR) of the current 2 by 2 design (first factor = f.adm
; second factor = f.riv
; dummy coding 0 and 1).
Any idea of how to make sense of the found pattern? Is this the right way to calculate the probabilities? The way I used the intercepts in the formula to calculate the probabilities might be problematic (cf., Negative coefficient in ordered logistic regression).
m2 <- polr(short.f ~ 1 + f.adm*f.riv + f.adm*RIV.st + f.riv*RIV.st, data=sampleNS)
# f.adm = dummy (first factor of 2 by 2 design);
# f.riv = dummy (second factor of 2 by 2 design);
# RIV.st = continuous predictor (standardized)
summary(m2)
Coefficients:
Value Std. Error t value
f.adm1 1.0203 0.14959 6.8203
f.riv1 -0.8611 0.14535 -5.9240
RIV.st -0.1636 0.09398 -1.7403
f.adm1:f.riv1 -1.2793 0.20759 -6.1625
f.adm1:RIV.st 0.0390 0.10584 0.3685
f.riv1:RIV.st 0.6989 0.10759 6.4953
Intercepts:
Value Std. Error t value
1|2 -2.6563 0.1389 -19.1278
2|3 -1.2139 0.1136 -10.6898
3|4 -0.3598 0.1069 -3.3660
4|5 0.9861 0.1121 8.7967
5|6 3.1997 0.1720 18.6008
Here you see how I tried to calculate the probabilities (probLALR
) for 1 of the 4 conditions of the 2 by 2 design:
inv.logit <- function(x){ return(exp(x)/(1+exp(x))) }
Pred <- seq(-3, 3, by=0.01)
b = c(-2.6563,-1.2139,-0.3598,0.9861,3.1997) # intercepts of model m2
a = c(1.0203,-0.8611,-0.1636,-1.2793,0.0390,0.6989) # coefficients of m2
probLALR <- data.frame(matrix(NA,601,5))
for (k in 1:5){
probLALR[,k] <- inv.logit(b[k] + a[1]*0 + a[2]*0 +
a[3]*Pred + a[4]*0*0 +
a[5]*Pred*0 + a[6]*Pred*0)
}
plot(Pred,probLALR[,1],type="l",ylim=c(0,1)) # p(1)
lines(Pred,probLALR[,2],col="red") # p(1 or 2)
lines(Pred,probLALR[,3],col="green") # P(1 or 2 or 3)
lines(Pred,probLALR[,4],col="orange") # P(1 or 2 or 3 or 4)
lines(Pred,probLALR[,5],col="orange") # P(1 or 2 or 3 or 4 or 5)
# option response functions:
orc = matrix(NA,601,6)
orc[,6] = 1-probLALR[,5] # prob of 6
orc[,5]= probLALR[,5]-probLALR[,4] # prob of 5
orc[,4]= probLALR[,4]-probLALR[,3] # prob of 4
orc[,3]= probLALR[,3]-probLALR[,2] # prob of 3
orc[,2]= probLALR[,2]-probLALR[,1] # prob of 2
orc[,1]= probLALR[,1] # prob of 1
plot(Pred,orc[,1],type="l",ylim=c(0,1)) # p(1)
lines(Pred,orc[,2],col="red") # p(2)
lines(Pred,orc[,3],col="green") # P(3)
lines(Pred,orc[,4],col="orange") # P(4)
lines(Pred,orc[,5],col="purple") # P(5)
lines(Pred,orc[,6],col="purple") # P(6)
# mean score
mean = orc[,1]*1+orc[,2]*2+orc[,3]*3+orc[,4]*4+orc[,5]*5+orc[,6]*6
plot(Pred,mean,type="l",xlab="RIV.st",ylab="expected mean score",ylim=c(1,6))