1

I have developed an xgboost model in R using the package 'xgboost', with a binary outcome being predicted. I'm looking at a partial dependence plot of one of the important features, and I'm confused at how to interpret it. This is the plot: enter image description here

The variable is binary and coded as '0' or '1', yet the plot indicates that this variable is important because of cases with value = '2'. I am not sure what this means?

After looking at descriptives, it does look like missing values of this variable are highly skewed to being associated with absence of the outcome of interest. Could the 2 in this plot represent missing values of this feature?

This would confuse me as I thought xgboost dealt with missing values by assigning them to one side of each branch split - not by providing them their own branch split?

Thanks.

Ben
  • 11
  • 2

0 Answers0