Calculating confidence interval for cross-sectional study with dichotomous variables

Question

I am conducting an epidemiological study and will have results that tell me which proportion of my population will have immunity to a certain disease. The results here will be immune / not immune with no gray area. A positive ELISA will indicate "immune"

I have the values for the specifity (99,81%) and sensitivity (86%) of the ELISA I am using but I don't know how to get to the 95% confidence interval from my "X% of the population is immune".

Thank you for your help!

To be clear about the connections: Does a positive ELISA indicate 'immune'? And does sensitivity 86% mean P(Pos ELISA | Immune) = 0.86, and specificity 99.81% mean P(Neg ELISA | Not Immune) = 0.9981? I deal with some of these issues in my [Q&A](https://stats.stackexchange.com/questions/455129/trying-to-estimate-disease-prevalence-from-fragmentary-test-results) on testing for Covid-19, but don't claim that link exactly answers your Question. — BruceET, May 26 '20 at 04:42
Thank you for pointing that out. Exactly, (1-specifity) will give me the chance of false positives and (1-sensitivity) the chance for false negatives — P.Weyh, May 26 '20 at 06:51

BruceET · Accepted Answer · 2020-05-26T06:36:18.383

I'm guessing that statements in my Comment are true, and supposing that you have results from $n$ tests of which $a$ give positive results. Then you have an estimate for $\tau,$ the number of positive tests: $\hat \tau = t = a/n.$

Confidence interval for proportion testing positive. Also assuming that the sample size $n$ is large enough for a Wald confidence interval to be valid, you would have the 95% CI for $\tau$ as follows:

$$t \pm 1.96\sqrt{\frac{t(1-t)}{n}}.$$

But the proportion testing positive is not the prevalence (or in your terminology the percentage immune).

Confidence interval for proportion immune. Letting $\pi = P(\mathsf{Immune})$ and $\eta =$ Sensitivity, $\theta =$ Specificity (both as in my Comment), one has from the Law of Total Probability that

$$\tau = \pi\eta + (1-\pi)(1-\theta).$$

Solving for $\pi$ one has

$$\pi = \frac{\tau + \theta - 1}{\eta + \theta - 1}.$$

So you can get a point estimate for $p$ for $\pi$ from the point estimate $t$ for $\tau$ by substitution:

$$p = \frac{t + \theta - 1}{\eta + \theta - 1}.$$

However, the estimate $p$ does not arise from binomial sampling, so one cannot use $t$ to make a Wald interval directly. A 95% confidence interval for $\pi$ results from using the displayed equation just above to transform the endpoints of the Wald interval for $\tau.$

For example, suppose $t = a/n = 700/1000 = 0.7$ and the Wald interval is $(0.672, 0.728).$ Then the corresponding 95% CI for $\pi$ is $(0.781, 0.846).$

(c(0.672, 0.728) + .9981 - 1)/(.86 + .9981 - 1)
[1] 0.7809113 0.8461718

Notes: In cases where $t$ is near $0$ or $1$ and sensitivity and/or specificity are poor, the 95% CI for $\pi$ found by this method may not be contained in $(0,1).$ Then the Gibbs Sampler in the link of my Comment can provide a way to get a reasonable Bayesian posterior probability interval ('credible interval'). If a beta prior distribution [which has support $(0,1)]$ is used for the parameter $\pi,$ then the posterior distribution must also have support $(0,1)$ and the Bayesian interval estimate must be contained in the unit interval.

(2) If your $n$ is in the low hundreds or below, then use the Agresti-Coull confidence interval for $\tau$ instead of the Wald interval.

(3) Reference: Suess & Trumbo (2010) Introduction to probability simulation and Gibbs sampling with R, Ch. 5.

Thank you so much for the explanation and also for the helpful link. One thing I am still a bit confused about though: in my test sample, I will test for Anti-SARS-CoV-2 antibodies as you correctly guessed. But if I test my study participants for antibodies, testing "positive" would indicate immunity. Why do I need to discriminate between proportion postivie and proportion immune here? My sample size is probably going to be 120-140 participants. Does that justify Wald CI? And can I also use these formulas to calculate the CI for a PCR test for SARS-CoV-2 antigen? Thanks a lot! — P.Weyh, May 26 '20 at 06:47
$\pi$ is not the same as $\tau.$ The proportion $\tau$ includes true and false positives. (You've seem to have very few false positives, but not detecting quite all the true positives.) If $\eta = \theta = 1,$ then you have a 'gold standard test' and $\pi = \tau,$ exactly. Your specificity is really, really good, but you sensitivity is just good. Maybe (just speculating) because not everybody who has been infected has detectable antibodies right away. // With your sample sizes I'd use Agresti or Jeffries intervals: See Wikipedia article on binomial CIs. — BruceET, May 26 '20 at 07:00
Thanks for the clarification, this was really very helpful for me. — P.Weyh, May 26 '20 at 07:13

Calculating confidence interval for cross-sectional study with dichotomous variables

1 Answers1