Using bootstrap confidence intervals for partial dependence plots

Question

Core Question

Bootstrapping could be used to calculate confidence intervals (CIs) for Partial Dependence Plots (PDPs) from a random forest. But would this be appropriate?

Example

I have a dataset with 5000 observations, a (count) dependent variable and 5 independent variables which are categorical, ordinal and continuous.

My aim to is to produce an explanatory model rather than a predictive model. I want to learn how these independent variables are associated with the outcome (e.g. through variable importance, quantifying interactions, PDPs and Accumulated Local Effects plots).

I would like to use a random forest as I expect the independent variables to be correlated and have interactions. I am treating it as a regression problem.

Once fitting the random forest, I am creating PDPs of continuous and categorical variables, using R's iml package. These don't have CIs and so I wanted to use bootstrapping to create CIs.

Current usage and opinions

Some authors have suggested using bootstrap CIs for PDPs and they have been used in some applied papers (e.g. here; here; and here). However, some people seem more skeptical (e.g. here).

I haven't found any strong evidence that this is a legitimate method (e.g. through an analytical or simulation study). But my intuition is that it is appropriate, as long as there the sample size and number of resamples are large enough for the bootstrap's asymptotic properties to be sensible.

Would bootstrapping be an appropriate method to determine CIs of PDPs?

I am mostly interested in the CI coverage. Other important considerations would also be of interest.

Side question: would it be interesting to perform a simulation study to test this method or is this a obvious problem?

What is your output variable? Is it a classification or regression problem? Please try to provide some more explanations. — Spätzle, Jul 15 '21 at 08:17
Thanks, @Spätzle. I have read about "asking good questions" and have tried to improve this one. I would appreciate feedback if this is better or worse. — TheBoro, Jul 15 '21 at 23:29

Using bootstrap confidence intervals for partial dependence plots

0 Answers0