2

In case of a regression we can apply a boosting approach as follows:

  1. Train a very simple model using the data set.
  2. Find a difference between the predictions and targets and use this difference as a new target.
  3. Train a new model using a new target.
  4. Repeat it as long as it make sense.

Can this idea be generalised on the case of a classification?

For example, we train a very simple model to predict probabilities of two exclusive classes. Now, we need some how define a difference between the targets and predictions. However, it is not straightforward anymore. Is there a way to do it?

Roman
  • 1,013
  • 2
  • 23
  • 38
  • 1
    You are talking about residuals right? Training new models on the residuals of the previous models, such as what XGBOOST does? It does so on both classification and regression. – user2974951 Sep 25 '19 at 13:14

1 Answers1

1

Yes, it works perfectly for the case of classification because we use the deviance residuals. Deviance residuals are directly related with the contribution each point has to the overall likelihood score. The deviance residuals are commonly used to check the model fit at each observation for generalized linear models; a gradient boosting machine is no different (see Elements of Statistical Learning, Hastie et al. (2009) Ch. 10.2 "Boosting Fits an Additive Model" for the relation between boosting and additive models).

When dealing with a binary classification task, we often assume that we are concerned with a binomial distribution. Because the outcome variable has only two possible types it naturally extends to classification tasks and to binomial (or binary) logistic regression. Basing our boosting procedure to the binomial likelihood and the subsequent residuals allows us to directly move from standard regression tasks to classification tasks; CV.SE as a short but to-the-point exposition on the relation between deviance residuals and binomial (log)-likelihood here.

usεr11852
  • 33,608
  • 2
  • 75
  • 117