Prediction intervals for machine learning algorithms

Question

I want to know if the process described below is valid/acceptable and any justification available.

The idea: Supervised learning algorithms don't assume underlying structures/distributions about the data. At the end of the day they output point estimates. I hope to quantify the uncertainty of the estimates somehow. Now, the ML model building process is inherently random (e.g. in sampling for cross-validation for hyperparameter tuning and in subsampling in stochastic GBM), so a modeling pipeline is going to give me a different output for the same predictors with each different seed. My (naive) idea is to run this process over and over again to come up with a distribution of the prediction, and I can hopefully make statements about the uncertainty of the predictions.

If it matters, the datasets I work with are typically very small (~200 rows.)

Does this make sense?

To clarify, I'm not actually bootstrapping the data in the traditional sense (i.e. I'm not re-sampling the data). The same dataset is used in every iteration, I'm just exploiting the randomness in xval and stochastic GBM.

If you actually want a prediction interval (rather than a confidence interval), you need to account for the variation in the observations about the model, not just the variation in model predicitions — Glen_b, May 21 '16 at 05:05
@Glen_b would either the approach outlined in the OP or a boostrapped version get me the confidence intervals? I'm starting to think a prediction interval isn't possible without specifying an underlying distribution for the data so I might need to rephrase the question next time. — kevinykuo, May 23 '16 at 19:05
You can bootstrap prediction intervals ... just not by doing what you describe in the Q. — Glen_b, May 24 '16 at 00:15
I couldn't give a sufficient answer here. But see Davison & Hinkley (1997) in the chapters on multiple regression and GLMs which give a sense of the kind of thing that might be needed. In the case of multiple regression for example, residuals are resampled to obtain bootstrap estimation of predictive uncertainty (variation in predictions due to parameter uncertainty) and resampled again to deal with process variation. With a suitable scheme you might perhaps also be able to deal with model specification uncertainty in that first step but you can't omit the 2nd step for process variability — Glen_b, May 24 '16 at 00:44
Sorry ran out of characters there. The D&H book is called *Bootstrap Methods and Their Application* — Glen_b, May 24 '16 at 00:45

score 5 · Answer 1 · edited Apr 13 '17 at 12:44

To me it seems as good approach as any to quantify the uncertainties in the predictions. Just make sure to repeat all modeling steps (for a GBM that would be the parameter tuning) from scratch in every bootstrap resample. It could also be worthwile to bootstrap the importance rankings to quantify the uncertainty in the rankings.

I have found that sometimes the intervals do not contain the actual prediction, especially when estimating a probability. Increasing the minimal number of observations in each terminal node usually solves that, at least in the data that I have worked with.

Conformal prediction seems like a useful approach for quantifying the confidence in predictions on new data. I have only scratched the surface thus far and others are probably more suited to give an optinion on that.

There is some crude R-code in my reply to this post about finding a GBM prediction Interval.

Hope this helps!

score 3 · Answer 2 · answered May 14 '16 at 12:24

You can split your uncertainty about the prediction to 'bias' and 'variance' terms. Bias term refers to misspecification of model: if you fit linear model for nonlinear function, you will always get some error. 'Variance' term refers to error in model parameters estimate. You approach account for variance part of uncertainty, while can not estimate the bias.

As suggested by @ErikL conformal prediction is theoretically justified approach which adopts an idea quite similar for bootstrap. Conformal prediction using reconstruction of model using a new point takes into account both bias and variance, while for regression they need significant computational resources. You can try it with Python using a nonconformist library.

user31264 · Answer 3 · 2016-05-20T23:36:52.830

No, it seems like a bad idea. First, as Alexey pointed out, there is bias and variance. Even for the best parameters choice, you cannot avoid the variance. Your method does not even try to address it. There is another very important issue, that some, and probably most, of the error of your model is in the data, rather than in the inherent randomness of the model. (Not to mention that some models, like plain linear regression, are not random at all, and for random models, the degree of randomness varies from model to model)

I suggest you to test your approach on some simple simulated data, and some model you know. I am sure that, for reasons I described, your estimated error will be much smaller than the real error (or your distribution will be much more narrow).

For estimating the error distribution you may use a good old cross-validation.

score -1 · Answer 4 · answered Mar 07 '19 at 16:52

I am thinking about this problem now. Here are my findings:

(1) Delta Method
(2) Bootstrap Resampling
(3) Bayesian method
(4) Mean-Variance Estimation (MVE)

The idea is trying to estimate the two sources of the prediction variability, uncertainty from the model parameter estimate and the irreducible error.

Here are several references:

Machine Learning approaches for estimation of prediction interval for the model output, Durga L. Shrestha, Dimitri P. Solomatine, 2006
A comparison of some error estimates for neural network models, Robert Tibshirani, 1995
Comprehensive Review of Neural Network-Based Prediction Intervals and New Advances, Abbas khosravi, Doug Creighton, 2011

Hope this helps and correct me any of the above is inappropriate. I'd like to hear more from others.

Could you explain how this post addresses the original question? — whuber, Mar 07 '19 at 19:25
The title is 'PI for ML algorithms', @kevinykuo is asking if his 'bootstraping' way works. I am pointing several references on some methods used in PI for Neural Network. — Demo, Mar 07 '19 at 21:08

Prediction intervals for machine learning algorithms

4 Answers4