We have two methods, A and B, for detecting a disease. Out of a 1000 samples with unknown prevalence, A detects 100 samples (10%). On the remaining 900 negatives, we apply the additional method B, and detect a further 20. As far as we can tell, all these are true positives.
We don't really know what the specificity / sensitivity of both methods is, because it depends on the prevalence, and we don't know that.
However, we would like to have an idea how much we can trust the observed increase by 20 in future applications. Is it real? Does the method B significantly improve the detection? Will it be a good idea to add B to the process? If we detected 100 samples in the first method, and 2 in the second, we would not trust it so much, would we?
We could produce a contingency table
method A method A+B
positive 100 120
negative 900 880
and test it with Chi^2^ or smth., but I think that this would be incorrect - the data in the second column include the data from the first column. Also, we are not interested in comparing the methods, after all, the method B is a "second line" and is not directly tested on samples that can be detected with A.
I am at loss as how to tackle this question. The problem is real, the disease deadly and the numbers are close enough to be real. I had no influence on the study design, we have what we have, but it is important enough to try to find an answer to the above question.