How to use R gbm with distribution = "adaboost"?

Question

Documentation states that R gbm with distribution = "adaboost" can be used for 0-1 classification problem. Consider the following code fragment:

gbm_algorithm <- gbm(y ~ ., data = train_dataset, distribution = "adaboost", n.trees = 5000)
gbm_predicted <- predict(gbm_algorithm, test_dataset, n.trees = 5000)

It can be found in the documentation that predict.gbm

Returns a vector of predictions. By default the predictions are on the scale of f(x).

However the particular scale is not clear for the case of distribution = "adaboost".

Could anyone help with the interpretation of predict.gbm return values and provide an idea of conversion to the 0-1 output?

This question appears to be *only* about how to interpret R output, & not about the related statistical issues (although that doesn't make it a bad Q). As such it is better asked, & probably answered, on [Stack Overflow](http://stackoverflow.com/), rather than here. *Please don't cross-post* (SE strongly discourages this), if you want your Q migrated faster, please flag it for moderator attention. — gung - Reinstate Monica, Sep 18 '12 at 15:57
@gung seems like a legitimate statistical question to me. The GBM package supplies the Deviance used for adaboost but it is not clear to me either what f(x) is and how to back transform to a probability scale (perhaps one has to use Platt scaling). http://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf — B_Miner, Sep 18 '12 at 16:26

score 12 · Answer 1 · answered Jul 14 '14 at 12:05

12

You can also directly obtain the probabilities from the predict.gbm function;

predict(gbm_algorithm, test_dataset, n.trees = 5000, type = 'response')

answered Jul 14 '14 at 12:05

Edwin

221
2
4

score 11 · Accepted Answer · edited Jul 17 '14 at 23:13

11

The adaboost method gives the predictions on logit scale. You can convert it to the 0-1 output:

gbm_predicted<-plogis(2*gbm_predicted)

note the 2* inside the logis

edited Jul 17 '14 at 23:13

Liran Katzir

53
6

answered Nov 07 '12 at 08:51

razgon

126
1
3

score 3 · Answer 3 · edited Jul 17 '14 at 23:33

The adaboost link function is described here. This example provides a detailed description of the computation:

library(gbm);
set.seed(123);
n          <- 1000;
sim.df     <- data.frame(x.1 = sample(0:1, n, replace=TRUE), 
                         x.2 = sample(0:1, n,    replace=TRUE));
prob.array <- c(0.9, 0.7, 0.2, 0.8);
df$y       <- rbinom(n, size = 1, prob=prob.array[1+sim.df$x.1+2*sim.df$x.2])
n.trees    <- 10;
shrinkage  <- 0.01;

gbmFit <- gbm(
  formula           = y~.,
  distribution      = "bernoulli",
  data              = sim.df,
  n.trees           = n.trees,
  interaction.depth = 2,
  n.minobsinnode    = 2,
  shrinkage         = shrinkage,
  bag.fraction      = 0.5,
  cv.folds          = 0,
  # verbose         = FALSE
  n.cores           = 1
);

sim.df$logods  <- predict(gbmFit, sim.df, n.trees = n.trees);  #$
sim.df$prob    <- predict(gbmFit, sim.df, n.trees = n.trees, type = 'response');  #$
sim.df$prob.2  <- plogis(predict(gbmFit, sim.df, n.trees = n.trees));  #$
sim.df$logloss <- sim.df$y*log(sim.df$prob) + (1-sim.df$y)*log(1-sim.df$prob);  #$


gbmFit <- gbm(
  formula           = y~.,
  distribution      = "adaboost",
  data              = sim.df,
  n.trees           = n.trees,
  interaction.depth = 2,
  n.minobsinnode    = 2,
  shrinkage         = shrinkage,
  bag.fraction      = 0.5,
  cv.folds          = 0,
  # verbose         = FALSE
  n.cores           = 1
);

sim.df$exp.scale  <- predict(gbmFit, sim.df, n.trees = n.trees);  #$
sim.df$ada.resp   <- predict(gbmFit, sim.df, n.trees = n.trees, type = 'response');  #$
sim.df$ada.resp.2 <- plogis(2*predict(gbmFit, sim.df, n.trees = n.trees));  #$
sim.df$ada.error  <- -exp(-sim.df$y * sim.df$exp.scale);  #$

sim.df[1:20,]

I can't edit as I woul change too little. ´df$y´ should be ´sim.df$y´. — Richi W, Sep 24 '15 at 07:02

How to use R gbm with distribution = "adaboost"?

3 Answers3

Linked