Optimization of pool size and number of tests for prevalence estimation via group testing

Question

I'm trying to devise a protocol for pooling lab tests from a cohort in order to get prevalence estimates using as few reagents as possible.

Assuming perfect sensitivity and specificity (if you want to include them in the answer is a plus), if I group testing material in pools of size $s$ and given an underneath (I don't like term "real") mean probability $p$ of the disease, the probability of the pool being positive is:

$$p_w = 1 - (1 - p)^s$$

if I run $w$ such pools the probability of having $k$ positive wells given a certain prevalence is:

$$p(k | w, p) = \binom{w}{k} (1 - (1 - p)^s)^k(1 - p)^{s(w-k)}$$

that is $k \sim Binom(w, 1 - (1 - p)^s)$.

To get $p$ I just need to maximize the likelihood $p(k | w, p)$ or use the formula $1 - \sqrt[s]{1 - k/w}$ (not really sure about this second one...).

My question is, how do I optimize $s$ (maximize) and $w$ (minimize) according to a prior $p$ in order have the most precise estimates, below a certain level of error?

For a start: https://medicalsciences.stackexchange.com/questions/21558/can-the-capacity-for-covid-19-tests-be-amplified-by-testing-multiple-samples-mix/21562#21562 Do you have data on sens & spec of the tests? I've so far only concluded limits from the FDA's EUA requirements and the EUA instructions. — cbeleites unhappy with SX, Apr 09 '20 at 20:37
Why do you need wheels (or would that be wells?)? In the foreseeable future, wouldn't you wait until the next wheel (batch/lot) is full? And I'd assume that once sample numbers are so low again that this means too long waiting times, $p$ may be so different from the situation now that you'd anyways want to re-calculate pool size. — cbeleites unhappy with SX, Apr 09 '20 at 20:40
I saw your answer to the other question and is very interesting thanks. How did you compute the two plot you presented, about the pool size and number of tests saved by prevalence? I need exactly that, or even better a way to estimate them based on acceptable error rate. I didn't understand the second comment. In what sense I need to wait until the well is full? the idea is to run periodic prevalence studies and save reagent when possible. Yep the pool size would need to be recomputed according to results. — Bakaburg, Apr 10 '20 at 08:46

score 0 · Accepted Answer · answered Apr 22 '20 at 16:07

I may have found a solution:

I can estimate the uncertainty around $p$ in two ways, given $w$ and $s$.

First I get the expected results of a pooled test through:

$$E[p_w] = 1 - (1 - p)^s$$

Then, through maximum likelihood and logit transformation, I get the Confidence Intervals:

$$CI_{p_{\alpha/2}} = 1 - \sqrt[s]{1 - logit^{-1}(logit(E[p_w]) \pm Z_{\alpha/2} \frac{1}{\sqrt{w E[p_w] (1-E[p_w]))}}}$$

In alternative I can exploit the Beta distribution as a conjugate of the binomial to get the posterior Credibility Intervals of $p$ for the given quantiles $q$:

$$CrI_{p_{\alpha/2}} = 1 - \sqrt[s]{1 - Beta(q, 1 + w E[p_w], 1 + w (1 - E[p_w])}$$

this second solution even allows the specification of priors.

I was afraid that these solution would underestimate variability, since they evaluate the variance at the test level (on $p_w$), not at the level of the underneath prevalence $p$. But comparing the results with a full MCMC hierarchical estimation of $p$ posterior with a model:

$$p \sim Beta(\alpha,\beta)$$ $$p_w \sim 1 - Binom(0, s, p)$$ $$p(k | w, p_w) \sim Binom(k, w, p_w)$$

it can be shown that there is no relevant difference with the intervals of the other two methods (which are of course faster to compute).

Finally, I search numerically the maximal value of $s$ and minimal of $w$ that keep the uncertainty below a specified threshold. I'm postulating that as the uncertainty goes down so will the estimation bias due to the loss of information in the pooling. I still haven't found an analytical way to get this error directly.

Optimization of pool size and number of tests for prevalence estimation via group testing

1 Answers1

Linked