I see now that Xgboost documentation only considers trees as weak learners, but I remember well tath linear models were an option too, I wander if they are still supported.
Anyway, I always assumed that some derivable non-linear transformation, like sigmoid, was applied to the linear combination of the predictors, because it is well known that the sum of any number of linear combinations is a linear combination itself. To my big surprise, I've recently been told that no non-linear transformation was ever considered in Xgboost algorithm. This highly received Q&A confirmes that.
But, in my understanding, Xgboost with linear weak learners is just a fancy implementation of a Newton gradient descent for generalized linear models (which is exactly what R glm
function does, except for the regularization).
Is it so?