1

I see now that Xgboost documentation only considers trees as weak learners, but I remember well tath linear models were an option too, I wander if they are still supported.

Anyway, I always assumed that some derivable non-linear transformation, like sigmoid, was applied to the linear combination of the predictors, because it is well known that the sum of any number of linear combinations is a linear combination itself. To my big surprise, I've recently been told that no non-linear transformation was ever considered in Xgboost algorithm. This highly received Q&A confirmes that.

But, in my understanding, Xgboost with linear weak learners is just a fancy implementation of a Newton gradient descent for generalized linear models (which is exactly what R glm function does, except for the regularization).

Is it so?

carlo
  • 4,243
  • 1
  • 11
  • 26

1 Answers1

2

Yes, your understanding is mostly correct; XGBoost actually uses coordinate descent to fit the base-learn GLMs and not Newton's method directly but conceptually you are correct. We will ultimately fit a GLM. I gave more details on the matter in the thread: Difference in regression coefficients of sklearn's LinearRegression and XGBRegressor.

In general, this equivalence is not unexpected. In the end of the day, the linear combination of linear models is still a linear model; our model is of form $y \approx X \beta$. Ultimately, each boosting iteration updates the final estimate by some amount $\alpha$, our learning rate. Making a large number of updates (i.e. boosting iterations) will result to having a XGBoost learner equivalent to GLM (with associated $L_1$ and $L_2$ regularisation).

usεr11852
  • 33,608
  • 2
  • 75
  • 117
  • I'm a bit perplexed by your answer in that other post. using a unit learning rate, wouldn't be the learning algorithm exactly the same as for a linear model, regularization excluded? – carlo Sep 08 '20 at 21:01
  • Not exactly for XGBoost as it uses the the gradient of the loss function (it is gradient boosting, not simple boosting). For a simple boosting implementation that would be the case indeed though, yes. (Also for XGBoost note that it boosts from 0.5 base score, that would need to be altered too) – usεr11852 Sep 08 '20 at 21:42
  • I'm not following you. linear models use newton gradient descent too. and what's 0.5 here? – carlo Sep 08 '20 at 21:48
  • 0.5 is the initial score, what our "zero-th booster" would produce. Other frameworks (e.g. LightGBM) boost from average when it comes to adjusting the initial score. – usεr11852 Sep 08 '20 at 21:52