I recently did a gradient boost model to predict an event Y/N. I have a lot of features and a huge dataset.
After a grid search cross validation, I manage to get an efficient enough model. (It is the verification dataset which show me that).
Now, my issue is that I struggle to do an accurate interpretation of the result, like in proper English. The trees of the algorithm are too numerous to go through. As well, if a good visualisation could be suggested, it would be nice.
Where I am:
I saw that where the caret implementation have a function to visualise the tree. Really good, but still too messy.
The graph by default coming with the gbm implementation is really nice, showing an histogram of the variables by importance. But still too much univariate at my taste.
As I have a verification data set, I did a profile of the Y against the N in my training data set and a profile of True positive Y against true negative N in the verification dataset. give good insight.
In the idea box:
Is there a way to simplify the set of tree to do a kind of "summary" of tree?
Is there an easy way to represent one variable against the result?