Problem in finding a non-parametric confidence interval for median and mean using Frank Harrell approach when we have more than two categories

Question

These days I am looking for a good estimation for the mean and median difference confidence interval when I have categorical variables with more than two levels using the Kruskal test, Here Dr. Frank Harrell @FrankHarrell said it is possible using PO model, I went then to his book of biostatistics. He introduced there a general approach using the PO model, before using that, I did a quick test to compute the median difference confidence interval for one categorical variable with two levels and one numeric variable and compare it with results from <wilcox.test> function that is a special case of Kruskal test (Wilcox function gives the confidence interval but Kruskal function doesn't), and I obtained a big difference as you see below. What kind of mistake I did, please. and Thanks in advance.

rm(list = objects())

set.seed (1234)
## similar to example on page 228 but for two levels
group =  rep(c('A','B'), 100) 
y  = rnorm (200 , 100 , 15) + 10*( group == 'B')
require (rms)
dd =   datadist(group , y); options( datadist ='dd')
f  = orm(y ~ group)
k   = contrast (f, list ( group ='A'), list ( group ='B'))
yquant  = Quantile(f)
ymed  = function(lp) yquant (0.5 , lp=lp)
Predict(f, group , fun=ymed)

# the output was like this 
  group      yhat     lower    upper
1     A  98.63239  95.24502 102.4621
2     B 107.70816 103.67949 110.8213

Response variable (y):  

Limits are 0.95 confidence limits 

## using wilcox function in R
wilcox.test( y~group, conf.int = TRUE,paired = FALSE, exact = T, mu=0, correct=F)

# The output was like this

Wilcoxon rank-sum exact test

data:  y by group
W = 3506, p-value = 0.0002345
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -12.407601  -3.964255
sample estimates:
difference in location 
             -8.159511

`wilcox.test` is using different (better) approach for CLs of differences: the Hodges-Lehmann estimator. This if for continuous Y (minimal ties) and is completely consistent with the WIlcoxon test. You'll have to run it in pairs since Kruskal-Wallis function doesn't do this. Which version of the `rms` package are you using? Also note that the `rmsb` package `blrm` function along with `contrast` and `Quantile` can provide exact (to within simulation error) Bayesian uncertainty intervals for a series of difference in means or quantiles using the proportional odds model. — Frank Harrell, Jul 29 '21 at 11:49
I use version 6.2-0, I want to learn this approach to do the CI for difference mean and difference median, not only for OR. I saw that I think in your book bbr. Could you please post more details with R for more clarity. — Rani, Jul 29 '21 at 13:13
The next release implements the delta method for getting better confidence intervals for means an quantiles. But that doesn't help with differences in means or quantiles. For now you'd need to put everything in a bootstrap loop to get bootstrap CLs, or use the Bayesian `rms` package `rmsb`. — Frank Harrell, Jul 29 '21 at 13:53
aha, so no one does the difference CI f? I used your suggestion in the textbook to do the CI for difference two mean on two group:>> diffs = numeric(2000) for(i in 1 : 20000){ diffs[i] = mean(sample(xsub1 , replace = TRUE )) - mean(sample(xsub2 , replace = TRUE )) } does this corerct? — Rani, Jul 29 '21 at 14:01

score 0 · Answer 1 · answered Sep 06 '21 at 02:41

@Rani, as suggested by Prof Harrell, consider using rmsb to derive Bayesian 95% uncertainty interval for the between-group difference in median y. Below, I've provided codes to run the Bayesian Wilcoxon test on your example. Finally, a delightful treasure trove of information on the proportional odds model can be found here

library(rmsb)
d <- tibble (group =  rep(c('A','B'), 100),
             y  = rnorm (200 , 100 , 15) + 10*( group == 'B'))

mod_blrm <- blrm(y ~ group,
                  keepsep=('group'),
                  priorsd = c(1.5), ## specify a weakly informative skeptical prior
                  data=d)

med_con <- rms::contrast(mod_blrm,  
              list(group ="B"), 
              list(group ="A"), fun=function(lp, ...) Quantile(mod_blrm)(lp=lp,...) )


Posterior Summaries for First X Settings

  Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD
1          109.1            109.1          105.1          112.5

Posterior Summaries for Second X Settings

  Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD
1          98.49            98.18           95.3          101.5

Posterior Summaries of First - Second

  Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD
1          10.58            10.71          5.218          15.54

# visualize
plot(med_con, which='diff') +
  facet_grid(~"Group B vs Group A")  
```

many thanks, it's great, but I got error: Error in blrm(y ~ group, keepsep = ("group"), priorsd = c(1.5), data = d) : could not find function "blrm". however I installed ,library(rmsb), library(ggplot2), library(tibble), also, do you think I can apply this for three groups? — Rani, Sep 15 '21 at 21:02
@Rani please ensure that all package dependencies of `rmsb` are successfully installed. PO regression generalizes the Wilcoxon and Kruskal-Wallis tests so you simply (i) replace the binary group with the 3-level group variable and (ii) expand the `priorsd` argument to specify the priors for the various group coefficients. — Yonghao, Sep 16 '21 at 22:01
Yes, it's work, but at least if I compare the result of your approach with the one obtained directly by the Wilcox function: wilcox.test( y~group, conf.int = TRUE, paired = FALSE, exact = T, mu=0, correct=F) I will get big difference CI (-12.407601, -3.964255), although you used non-informative prior. This is normal? — Rani, Sep 20 '21 at 10:36

Problem in finding a non-parametric confidence interval for median and mean using Frank Harrell approach when we have more than two categories

1 Answers1