How to tell a method is improving the detection of a disease?

Question

We have two methods, A and B, for detecting a disease. Out of a 1000 samples with unknown prevalence, A detects 100 samples (10%). On the remaining 900 negatives, we apply the additional method B, and detect a further 20. As far as we can tell, all these are true positives.

We don't really know what the specificity / sensitivity of both methods is, because it depends on the prevalence, and we don't know that.

However, we would like to have an idea how much we can trust the observed increase by 20 in future applications. Is it real? Does the method B significantly improve the detection? Will it be a good idea to add B to the process? If we detected 100 samples in the first method, and 2 in the second, we would not trust it so much, would we?

We could produce a contingency table

           method A    method A+B
 positive      100         120
 negative      900         880

and test it with Chi^2^ or smth., but I think that this would be incorrect - the data in the second column include the data from the first column. Also, we are not interested in comparing the methods, after all, the method B is a "second line" and is not directly tested on samples that can be detected with A.

I am at loss as how to tackle this question. The problem is real, the disease deadly and the numbers are close enough to be real. I had no influence on the study design, we have what we have, but it is important enough to try to find an answer to the above question.

Without further substantive knowledge about the disease and how procedures A and B work, it's hard to see how this question could be answered definitively: perhaps A detects some forms of the disease and B detects other forms, so possibly B alone would detect only 20 cases, period. I would therefore hesitate to formulate a purely statistical answer. — whuber, Jun 07 '18 at 13:32
If you posit that these are indeed all true positives, then there is no uncertainty. Method B detects additional cases of a disease you say is deadly, period. The same logic applies to the case if we detected only 2 more cases. There is no question of "how much we can trust the observed increase". The only tradeoff would be the cost of method B. So it seems like we do need some assumptions about the true sensitivity/specificity of method B, or the prevalence. — Stephan Kolassa, Jun 07 '18 at 13:32
@Stephan You seem to be reading the question as saying B would have detected the 100 cases A detected, but I don't see any information in the question that implies this. — whuber, Jun 07 '18 at 13:33
@whuber: no, I'm looking at the OP's comparison of method A alone against A+B. He does not seem to be interested in A against B. And A+B detects 20 additional cases than A alone, with (as posited) zero uncertainty. — Stephan Kolassa, Jun 07 '18 at 13:35
@whuber: we don't know this, in the sense that both methods detect patients with the same subsequent pathology. — January, Jun 07 '18 at 13:41
How about simply calculating confidence interval for the proportion of individuals detected by A+B; that way, we can at least give an idea about the certainty of our estimate of how many additional cases we can detect? — January, Jun 07 '18 at 13:48

score 1 · Accepted Answer · answered Jun 07 '18 at 14:54

So it seems that with the current structure, we don't have any information about the negatives. If I understand the problem correctly, what we have is that:

Method A had 100 true positives, 0 false positives, at least 20 false negatives and no information about the true negatives negatives in 1000 cases tested.
Method B had 20 true positives, 0 false positives and no information about the negatives in 900 cases tested.
Method A + B had 120 true positives, 0 false positives and no information about the negatives in 1000 cases tested.

That being said, from these stats you can calculate some relevant diagnostic test metrics however, without any other information (e.g: prevalence, additional "cost" of applying method B, classification of the negatives etc) this info will not add any value.

Finally, as Stephan said, if the disease is deadly then the cost of a false negative is "extremely high" and without any additional information we can't really infer much and I don't see any justification why not including method B.

PS: Can we at least test Method B alone in the 1000 cases? Maybe method A is not needed at all.

I like your response. Unfortunately, the data is what it is, it has been collected and there is no way of going back. We decided to simply put confidence intervals on the additional proportion of detected cases as a very rough measure of the observed effect. If the paper appears, I will post it here. I am fairly sure you are in for a small surprise as to what the method B is. — January, Jun 08 '18 at 21:40
P.S. Method A is cheaper and faster and an established procedure, so it is going to stay. — January, Jun 08 '18 at 21:41

ReneBt · Answer 2 · 2018-06-14T08:10:52.410

We're having to take a lot on good faith here, but accepting your restrictions then I would suggest that this is a valid scenario for deploying McNemar's test.

Whether this gives the information you want is another matter but if I'm interpreting your post correctly it would. It does not test accuracy, so will not allow you to use it to claim accuracy, just whether the difference caused by B is significant. This test is designed to assess if there is significant imbalance in how two tests disagree or if there is significant movement of cases between two conditions. It ignores the main diagonal and looks only at the off diagonal elements, where the conditions are mismatched.

First you need the contingency table updated

+-------+-----+------+
|       | A:P | A:N  |
+-------+-----+------+
| A+B:P | 100 |   20 |
| A+B:N |   0 |  880 |   
+-------+-----+------+

In this case you are comparing agreement /disagreement between two related protocols rather than independent tests. My understanding of the logic of A+B is that if A then +, else if B +else negative. I believe this is still valid for McNemar's as it is intended for comparing conditions that are related.

Rather than go into the details here I'll link to various resources :

https://en.m.wikipedia.org/wiki/McNemar%27s_test https://statistics.laerd.com/spss-tutorials/mcnemars-test-using-spss-statistics.php What is the difference between McNemar's test and the chi-squared test, and how do you know when to use each? www.researchgate.net/post/How_to_use_McNemars_test_to_compare_accuracy_of_classifications

How to tell a method is improving the detection of a disease?

2 Answers2