After I multiply imputed my dataset m times I wanted to calculate a binomial proportion confidence interval. How I can I combine the various estimates of the confidence intervals while taking Rubins rules into account?
-
For a general idea of Rubin's combining rules, you can read http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2727536/. The original book is http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471655740.html – Qaswed Jun 22 '16 at 18:29
-
@Qaswed I rephrased the question as it might have been too broad. Thanks for the article, can you have a look at my updated question, I hope I managed to do it correctly now. – sluijs Jun 23 '16 at 13:50
-
@Roger Although the score interval formula is a good way of calculating confidence intervals around a proportion, you could also try using the easier to apply Agresti-Coull adjusted Wald confidence intervals (see [link](http://www.stat.ufl.edu/~aa/articles/agresti_coull_1998.pdf) ) The problem however, lies in taking into account the variance between imputation datasets. Neither the Score or Agresti-Coull CIs take this into considerations. I have upvoted your question as I'd also like an answer to this, but as to whether or not using **only** Score CIs is correct, I'd say **no**. – IWS Jan 11 '17 at 13:16
1 Answers
This is indeed an interesting problem. The issue is that the standard errors that are based on the central limit theorem for proportions are often undesirable because proportions are a computed quantity and for that reason exhibit skewed uncertainty over sampling. The Wilson score, such as you mentioned, gets around the skewness by estimating a different quantity than the standard proportion $k/n$. What you need to use Rubin's rules is an estimate of the within-imputation variance of this transformed proportion, which is just the variance/standard error estimated on a single dataset, along with the transformed proportion itself for each dataset.
So for the Wilson score interval, you first need to calculate the transformed estimate $ \hat{p} + \frac{1}{2n}z^2 $ and then separately the variance, which from your formula is $ (z\sqrt{\frac{1}{n} \hat{p}(1-\hat{p}) + \frac{1}{4n^2}z^2})^2 $
That will give you estimates of the transformed parameter and the transformed parameter's variance for each of $m$ datasets.
You can then combine these estimates using some of the available R tools, such as mi.meld
from Amelia
or mice
as you mentioned or the R package mitools
. Then once you have the transformed parameters, you can compute the confidence interval based on the newly derived variance/parameter estimate.
This would be easier if these R packages supplied the transformed estimates instead of just the confidence intervals, but you can probably dig them out of the associated R code.

- 86
- 1
- 2
-
[Lott & Reiter (2017)](https://amstat.tandfonline.com/doi/abs/10.1080/00031305.2018.1473796?journalCode=utas20) wrote a paper on this subject recently. I finally ended up using the Agresti-Coull CI. – sluijs Aug 06 '19 at 10:36