Suppose I have observational data and want to predict some disease based on patient visits. In each visit I know whether the disease occured. The disease is preventable (which is why I want to predict it---so it can be prevented). I want to use past visits to predict risk during future visits. There is no data on what happens between visits, and the visit for which we hope to predict disease risk is simply a binary variable indicating presence. The disease can occur any time during the visit.
However consider a patient who leaves a visit with high risk for the disease but between visits decides to change lifestyle and dramatically reduces risk. We then get in the next visit that the patient does not get the disease. However, all we see is the previous visit. Hence the model learns that the previous visit, which left the patient at high risk, is actually associated with a negative outcome!
Another scenario: say that the patient leaves a visit with high risk, but this time does nothing. At next visit, the doctor sees the high risk and immediately gives a medication to prevent the disease. Now, we again get a negative response for this very high risk patient.
Ideally, to learn that this patient had a high risk visit, we needed to see them get the disease.
Hence, there is a catch 22. If you can prevent a disease, you cannot develop a predictive model; if you cannot prevent it, why try to predict it?
This seems equivalent to being presented a pristine dataset (where the responses really do reflect the risk) and then having some malevolent analyst secretly ---and systematically--- change an unknown number of the labels to the opposite outcome. The resulting model ultimately predicts something (and it might do it well), but it is not predicting risk.