Core Question
Bootstrapping could be used to calculate confidence intervals (CIs) for Partial Dependence Plots (PDPs) from a random forest. But would this be appropriate?
Example
I have a dataset with 5000 observations, a (count) dependent variable and 5 independent variables which are categorical, ordinal and continuous.
My aim to is to produce an explanatory model rather than a predictive model. I want to learn how these independent variables are associated with the outcome (e.g. through variable importance, quantifying interactions, PDPs and Accumulated Local Effects plots).
I would like to use a random forest as I expect the independent variables to be correlated and have interactions. I am treating it as a regression problem.
Once fitting the random forest, I am creating PDPs of continuous and categorical variables, using R's iml
package. These don't have CIs and so I wanted to use bootstrapping to create CIs.
Current usage and opinions
Some authors have suggested using bootstrap CIs for PDPs and they have been used in some applied papers (e.g. here; here; and here). However, some people seem more skeptical (e.g. here).
I haven't found any strong evidence that this is a legitimate method (e.g. through an analytical or simulation study). But my intuition is that it is appropriate, as long as there the sample size and number of resamples are large enough for the bootstrap's asymptotic properties to be sensible.
Would bootstrapping be an appropriate method to determine CIs of PDPs?
I am mostly interested in the CI coverage. Other important considerations would also be of interest.
Side question: would it be interesting to perform a simulation study to test this method or is this a obvious problem?