1

Struggling to understand the concept of pooled sampling and estimation of prevalence. Would really appreciate some help to understand this.

Example: 0.5% of the population has a specific disease. Blood samples are taken on the same day from 3000 individual persons and split into 100 pooled samples, each with 30 individual samples. Assume perfect tests for sensitivity/specificity.

I assume binomial distribution, as all 3000 samples are individual, the probability is constant p=0.005 and there are only two possible outcomes (pool contains at least one sample with disease or pool does not contain any samples with disease).

When these pools were tested, 21/100 pooled samples contained at least one sample with disease.

Questions:

1) When Y is the number of pools with disease (Y=21), Explain why E(Y) is equivalent to 30 * prevalence

For each individual sample, because there are only two possible outcomes of each individual sample, Ij, which is disease or no disease, E(Ij)=0*(1-p) + 1*p=p and Var(Ij)=(0-p)^2*(1-p)+(1-p)ˆ2*p=p(1-p).

So, for one pool, E(X): E(X)=E(I1)+E(I2)+...+E(In)=np, and for variance: Var(X)=Var(I1)+Var(I2)+...+Var(In)=np(1-p).

But then how is E(Y)=30 * p??

2) How do I estimate the prevalence and 95%CI based on 21/100 pools containing disease? I have found a formula which I think will give me the answer I need:

p=1-(1-x/m)^(1/k)
Var(p)=(x/m)*(1-x/m)^(2/k-1)/(mk)^2

When m=100 pooled samples, x=21 positive pools, k=pool size 30 and p is the estimated prevalence, the estimated prevalence is:

p=1-(1-(21/100))^(1/30)
Var(p)=(21/100)*(1-21/100)^(2/30-1)/(100*30)^2

But I don't really understand the formula for prevalence estimate. I know that the probability of there being no disease is (1-p)^n and so the probability of at least one sample with disease in a pool is: 1-(1-p)^n. The formula above seems like it is perhaps somhow based on that?

Can someone please explain p=1-(1-x/m)^(1/k)? (And where does 1/k come from?)

TScott
  • 11
  • 1

0 Answers0