Partial_dependence_plot with gbt estimator has a mean response shift between curves computed by different methods ( 'brute' or 'recursion')

Question

The new version of scikit-learn's partial_dependence function has the 'kind' additional option. With kind='average' one can compute the values for the partial dependence plot (PDP), with kind='individual' one can plot the individual conditional expectation (ICE), with kind='both' both, ICE curves and PDP is computed. The PDP should be the average of the curves of the ICE plots. Unfortunately, with automatic setting, the average of the ICE curves does not give (or approximate) the PDP curve. When using scikit_learn's gradient boosted tree, without setting any method, the PDP is automatically computed by the 'recursion' method, that is, in my case, the average response is computed from the GBT class from the trees. With this option it seems that the computation ignores the mean response term. However, for the ICE plot the predictions and the average (if kind='both') are computed from the predictions. So the two methods have a shift, which I think is the mean response (the mean of the training labels). Am I right, or did I misunderstand something? I think, if I am right, this is inconsistent and should be corrected.

Partial_dependence_plot with gbt estimator has a mean response shift between curves computed by different methods ( 'brute' or 'recursion')

0 Answers0