Optimism bootstrap with non-linear models

Question

I have come across an example in my research with heavily overfit non-linear probabilistic classifiers, where the optimism bootstrap appears to underestimate the optimism, even when using a proper scoring metric (log-likelihood/cross-entropy loss).

A simplified hypothetical case appears to demonstrate this. Consider a binary case, with an overfit classifier. In this example, for an element in the training data, the classifier will predict the correct class with 0.99 probability. However, the classifier has fit purely to noise, which has no true correlation with the output class, such that when it is used to predict out-of-training-sample data, it predicts the correct class at 0.5 probability. This could occur for an arbitrary non-linear classifier.

Using the notation from Frank Harrell's 1996 paper, the apparent per-sample cross-entropy loss ($D_{app}$) of the classifier is $-\ln(0.99) = 0.004$. However, for out of sample, unseen data the per-sample cross-entropy loss $D_{true}$ is $-\ln(0.5) = 0.3$. As such, $$D_{boot} = D_{app} = 0.004,$$ and $$D_{orig} \approx 0.632*D_{app} + 0.368*D_{true} \approx 0.1.$$

(Edit: Note that I have used 0.632 and 0.368 in the above equation to provide the analytical expectation of $D_{orig}$, but this would normally be directly measured by fitting the model to the bootstrap sample, and using it to predict the original data. I am using the standard optimism bootstrap, and not the .632 or .632+ bootstrap methods).

This gives an optimism $O$ of $$O = D_{boot}-D_{orig} \approx -0.096,$$

which in turn gives an optimism corrected performance $D_{est}$ of $$D_{est} = D_{app} - O = 0.1.$$ This is clearly much lower than the true expected $D_{true}$ of $0.3$.

Obviously this is an extreme and overly simplified example, but it demonstrates the result I have observed in my results.

Have I misunderstood the application of the optimism bootstrap, or is there another explanation? (With specific reference to @FrankHarrel's answers here and here)

Welcome to the site. It seems that you are using a form of the 0.632 bootstrap to get the optimism estimate. Is there a particular reason for that choice? One of the [answers](https://stats.stackexchange.com/a/96833/28500) you cite from @FrankHarrell indicates that the 0.632 bootstrap isn't needed when you use a proper scoring metric. At first glance it seems that a standard bootstrap might resolve this issue. — EdM, Nov 02 '18 at 17:59
Thank you. I am using the standard Efron-Gong optimism bootstrap @FrankHarrel refers to, as I am using a proper scoring metric. The 0.632 in my answer is to provide an analytical expectation of $D_{orig}$, i.e. the score of the model fitted on the bootstrap when used to predict the original unsampled data. — thebigspin, Nov 02 '18 at 23:23

Optimism bootstrap with non-linear models

0 Answers0

Linked