Question: I estimate the partial dependence of $y$ on one predictor in a fitted random forest (RF). I want to now fit a parametric model to this partial dependence. How can I estimate my uncertainty when fitting this statistical model to the partial dependence estimated from the RF?
To flesh this out with an example: suppose that plant height is influenced by light, rainfall and pH, all in a nonlinear manner. I fit an RF (or other machine learning model) with height being predicted by all three. If I want to understand how light alone affects height, I can estimate its partial effect (or equivalently, the partial dependence of height on light) from the fitted RF.
Suppose that I know what this shape should look like and have an equation to describe it. I would like to fit this equation to the partial dependence estimated from the RF. Loosely speaking, I am trying to 'filter' the RF's estimated partial dependence through the equation, which represents our prior understanding based on many earlier studies. I am using an RF instead of a fully parametric model because I do not know precisely how the other variables (rainfall, pH) affect height.
How can I go about estimating these parameter values in a way that captures the uncertainty in (i) the data and (ii) the fitted random forest?
I encountered a version of this idea in a post on Andrew Gelman's stats blog. According to Gelman - who focusses on predictions from the whole model, not partial effects/dependencies - the idea has not really been developed.
I suspect that there is a bootstrap-based solution to this, but I am unsure. There may be simpler solutions that work more directly from the fitted random forest, but I am unaware of them because of an incomplete understanding of how partial effects are calculated. I'd appreciate any suggestions.