These days I am looking for a good estimation for the mean and median difference confidence interval when I have categorical variables with more than two levels using the Kruskal test, Here Dr. Frank Harrell @FrankHarrell said it is possible using PO model, I went then to his book of biostatistics. He introduced there a general approach using the PO model, before using that, I did a quick test to compute the median difference confidence interval for one categorical variable with two levels and one numeric variable and compare it with results from <wilcox.test> function that is a special case of Kruskal test (Wilcox function gives the confidence interval but Kruskal function doesn't), and I obtained a big difference as you see below. What kind of mistake I did, please. and Thanks in advance.
rm(list = objects())
set.seed (1234)
## similar to example on page 228 but for two levels
group = rep(c('A','B'), 100)
y = rnorm (200 , 100 , 15) + 10*( group == 'B')
require (rms)
dd = datadist(group , y); options( datadist ='dd')
f = orm(y ~ group)
k = contrast (f, list ( group ='A'), list ( group ='B'))
yquant = Quantile(f)
ymed = function(lp) yquant (0.5 , lp=lp)
Predict(f, group , fun=ymed)
# the output was like this
group yhat lower upper
1 A 98.63239 95.24502 102.4621
2 B 107.70816 103.67949 110.8213
Response variable (y):
Limits are 0.95 confidence limits
## using wilcox function in R
wilcox.test( y~group, conf.int = TRUE,paired = FALSE, exact = T, mu=0, correct=F)
# The output was like this
Wilcoxon rank-sum exact test
data: y by group
W = 3506, p-value = 0.0002345
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
-12.407601 -3.964255
sample estimates:
difference in location
-8.159511