I have come across an example in my research with heavily overfit non-linear probabilistic classifiers, where the optimism bootstrap appears to underestimate the optimism, even when using a proper scoring metric (log-likelihood/cross-entropy loss).
A simplified hypothetical case appears to demonstrate this. Consider a binary case, with an overfit classifier. In this example, for an element in the training data, the classifier will predict the correct class with 0.99 probability. However, the classifier has fit purely to noise, which has no true correlation with the output class, such that when it is used to predict out-of-training-sample data, it predicts the correct class at 0.5 probability. This could occur for an arbitrary non-linear classifier.
Using the notation from Frank Harrell's 1996 paper, the apparent per-sample cross-entropy loss ($D_{app}$) of the classifier is $-\ln(0.99) = 0.004$. However, for out of sample, unseen data the per-sample cross-entropy loss $D_{true}$ is $-\ln(0.5) = 0.3$. As such, $$D_{boot} = D_{app} = 0.004,$$ and $$D_{orig} \approx 0.632*D_{app} + 0.368*D_{true} \approx 0.1.$$
(Edit: Note that I have used 0.632 and 0.368 in the above equation to provide the analytical expectation of $D_{orig}$, but this would normally be directly measured by fitting the model to the bootstrap sample, and using it to predict the original data. I am using the standard optimism bootstrap, and not the .632 or .632+ bootstrap methods).
This gives an optimism $O$ of $$O = D_{boot}-D_{orig} \approx -0.096,$$
which in turn gives an optimism corrected performance $D_{est}$ of $$D_{est} = D_{app} - O = 0.1.$$ This is clearly much lower than the true expected $D_{true}$ of $0.3$.
Obviously this is an extreme and overly simplified example, but it demonstrates the result I have observed in my results.
Have I misunderstood the application of the optimism bootstrap, or is there another explanation? (With specific reference to @FrankHarrel's answers here and here)