Using the following R code I obtain a decision tree using the agaricus dataset:
data(agaricus.train, package='xgboost')
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 3,
eta = 1, nthread = 2, nrounds = 2,objective = "binary:logistic")
# plot all the trees
xgb.plot.tree(model = bst)
# plot only the first tree and display the node ID:
xgb.plot.tree(model = bst, trees = 0, show_node_id = TRUE)
I want to understand more clearly the "value" output of the tree (the 3rd line in the oval shaped object). Here we can see that tree 0
leaf 7
gives a value 1.90174532
. (That is the first terminal node in the image). I want to know if this value
is the same as the log-odds
score. So, all observations which follow the upper path of the decision tree will obtain a log-odds score of 1.90174532
. Then in a new decision tree the observations will fall into a different split depending on each observations characteristics and will obtain a "new" value
Then we sum up all these values
across all trees to obtain a final log-odds
score which can then be converted to a predicted probability using the logistic function.
Is my intuition correct? Does value
= log-odds
.