.632+ bootstrap estimator-defining gamma in continuous variable case

Question

I am trying to implement the .632+ bootstrap estimator for internal validation as proposed by Efron and Tibshiraini 1997. Looking at the paper, I can see how gamma is defined in the case of classification problem (equations 26 and 27), but is there an equivalent that is typically applied when predicting a continuous variable? Is the .632+ estimator applied in the context of predicting continuous variables?

Thanks

score 4 · Accepted Answer · answered Jun 25 '13 at 17:06

4

The R rms package's validate series of functions and a core function predab.resample extends .632 to other indexes using a heuristic argument. But it turns out that the reason .632 was helpful was that Efron used it for an improper discontinuous scoring rule (proportion classified correctly). When applied to other more reasonable accuracy scores there does not seem to be any advantage over the ordinary Efron-Gong optimism bootstrap.

answered Jun 25 '13 at 17:06

Frank Harrell

74,029
5
148
322

thank you. Is the Efron-Gong optimism bootstrap the same as the "more refined" bootstrap approach described in [Introduction to the bootstrap](cindy.informatik.uni-bremen.de/cosy/teaching/CM_2011/Eval3/pe_efron_93.pdf)? – DavidT85 Jun 25 '13 at 19:31
I get "URL not found" – Frank Harrell Jun 25 '13 at 21:05
Apologies, mine is also broken now. The ref was Efron and Tibshirani (1993) 'Introduction to the bootstrap'. In it the authors write "The final estimate of prediction error is the apparent error plus the downward bias in the apparent error", with the downward bias given by the average difference between 1) the error when applying the model fitted on the bootstrap sample to the original data, and 2) the error when applying the model fitted to the bootstrap sample to the bootstrap sample itself. I hope this makes sense, thanks again. – DavidT85 Jun 25 '13 at 21:25
Yes that is very helpful. That is the standard Efron-Gong optimism bootstrap as implemented in my R `rms` package as the default for `validate()` and `calibrate()`. – Frank Harrell Jun 25 '13 at 22:36

thebigspin · Answer 2 · 2018-11-03T07:29:34.927

In section 7.11 of Elements of Statistical Learning, an alternative notation for estimating $\gamma$ is given

$$\hat{\gamma} = \frac{1}{N^2}\sum_{i=1}^N\sum_{i'=1}^NL(y_i,\hat{f}(x_{i'})),$$

where $L(y,\hat{f}(x))$ is an arbitrary cost function.

E.g. consider multiclass classification with $C$ classes, where $y$ is a one-hot vector corresponding to the ground truth class membership, and $\hat{p}$ is the vector corresponding to the predicted probabilities. For cross-entropy loss

$$L(y,\hat{f}(x)) = -\sum_{c=1}^C y_c \ln(\hat{p}_c) $$

and similarly for mean-squared error

$$L(y,\hat{f}(x)) = \sum_{c=1}^C y_c(1-\hat{p}_c)^2$$.

Substituting these equations into the top equation gives

$$\hat{\gamma} = -\frac{1}{N^2}\sum_{i=1}^N\sum_{i'=1}^N\sum_{c=1}^C y_{ic}\ln(\hat{p}_{i'c})$$ and $$\hat{\gamma} = \frac{1}{N^2}\sum_{i=1}^N\sum_{i'=1}^N\sum_{c=1}^C y_{ic}(1-\hat{p}_{i'c})^2$$ for cross-entropy loss and MSE respectively.

I.e. calculate the cost function for each feature vector $x$ in the dataset for every ground-truth observation $y$ in the dataset.

This is equivalent to weighting the cost function for each predicted class for each element by the ratio of the occurrence of that class, e.g. for cross-entropy loss

$$\hat{\gamma} = -\frac{1}{N}\sum_{c=1}^C\left(\frac{1}{N}\sum_{i=1}^N y_{ic}\right)\sum_{i'=1}^N \ln(\hat{p}_{i'c})$$

Furthermore, I have found an example in my research where the standard optimism bootstrap underestimates optimism for heavily overfit classifiers, even when using a proper scoring metric (log-likelihood loss). See here.

.632+ bootstrap estimator-defining gamma in continuous variable case

2 Answers2