How to combine/pool binomial confidence intervals after multiple imputation?

Question

After I multiply imputed my dataset m times I wanted to calculate a binomial proportion confidence interval. How I can I combine the various estimates of the confidence intervals while taking Rubins rules into account?

For a general idea of Rubin's combining rules, you can read http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2727536/. The original book is http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471655740.html — Qaswed, Jun 22 '16 at 18:29
@Qaswed I rephrased the question as it might have been too broad. Thanks for the article, can you have a look at my updated question, I hope I managed to do it correctly now. — sluijs, Jun 23 '16 at 13:50
@Roger Although the score interval formula is a good way of calculating confidence intervals around a proportion, you could also try using the easier to apply Agresti-Coull adjusted Wald confidence intervals (see [link](http://www.stat.ufl.edu/~aa/articles/agresti_coull_1998.pdf) ) The problem however, lies in taking into account the variance between imputation datasets. Neither the Score or Agresti-Coull CIs take this into considerations. I have upvoted your question as I'd also like an answer to this, but as to whether or not using **only** Score CIs is correct, I'd say **no**. — IWS, Jan 11 '17 at 13:16

score 5 · Accepted Answer · answered Jul 31 '17 at 16:40

This is indeed an interesting problem. The issue is that the standard errors that are based on the central limit theorem for proportions are often undesirable because proportions are a computed quantity and for that reason exhibit skewed uncertainty over sampling. The Wilson score, such as you mentioned, gets around the skewness by estimating a different quantity than the standard proportion $k/n$. What you need to use Rubin's rules is an estimate of the within-imputation variance of this transformed proportion, which is just the variance/standard error estimated on a single dataset, along with the transformed proportion itself for each dataset.

So for the Wilson score interval, you first need to calculate the transformed estimate $ \hat{p} + \frac{1}{2n}z^2 $ and then separately the variance, which from your formula is $ (z\sqrt{\frac{1}{n} \hat{p}(1-\hat{p}) + \frac{1}{4n^2}z^2})^2 $

That will give you estimates of the transformed parameter and the transformed parameter's variance for each of $m$ datasets.

You can then combine these estimates using some of the available R tools, such as mi.meld from Amelia or mice as you mentioned or the R package mitools. Then once you have the transformed parameters, you can compute the confidence interval based on the newly derived variance/parameter estimate.

This would be easier if these R packages supplied the transformed estimates instead of just the confidence intervals, but you can probably dig them out of the associated R code.

[Lott & Reiter (2017)](https://amstat.tandfonline.com/doi/abs/10.1080/00031305.2018.1473796?journalCode=utas20) wrote a paper on this subject recently. I finally ended up using the Agresti-Coull CI. — sluijs, Aug 06 '19 at 10:36

How to combine/pool binomial confidence intervals after multiple imputation?

1 Answers1