My question is about any possible relationship between the significance of a comparison test (Mann-Whitney U) and the ROC curve.
If the comparison test is strongly significant, should we expect a better ROC curve, and consequently better NPV, PPV etc.?
In details: I'm comparing the blood results between 2 groups for a predictive model. My pilot data, from 50 samples, showed statistical significance (Mann-Whitney U test) and the ROC curve showed good specificity, sensitivity, PPV and NPV.
Afterwards, I ran the same tests with 200 samples. The Mann-Whitney U test showed an even better statistical significance (far better). However, from the ROC curve only the specificity was of value, but not as good as the one from the pilot results. The sensitivity, PPV and NPV have worsened tragically!
I was expecting with bigger numbers and better statistical significance to have a better ROC.
Is there an explanation for that?