The following example borrow from forecastxgb author's blog, the tree-based model can't extrapolate in it's nature, but there are definitely some method to combine the benefit of tree model (interaction factors) and linear model's trend extrapolate ability. Could anyone give some ideas?
I have seen some kaggle solution, some people advise using the linear model prediction as the tree model's feature, it can imporve the prediction result, but how to improve the extrapolate?
Another idea is using the xgboost predict the residual of the linear model, this can help the prediction a lot.
Is there anyway?
library(xgboost) # extreme gradient boosting
set.seed(134) # for reproducibility
x <- 1:100 + rnorm(100)
y <- 3 + 0.3 * x + rnorm(100)
extrap <- data.frame(x = 101:120 + rnorm(20))
xg_params <- list(objective = "reg:linear", max.depth = 2)
mod_cv <- xgb.cv(label = y, params = xg_params, data = as.matrix(x), nrounds = 40, nfold = 10)
# choose nrounds that gives best value of root mean square error on the training set
best_nrounds <- which(mod_cv$evaluation_log$test_rmse_mean == min(mod_cv$evaluation_log$test_rmse_mean))
mod_xg <- xgboost(label = y, params = xg_params, data = as.matrix(x), nrounds = best_nrounds)
p <- function(title){
plot(x, y, xlim = c(0, 150), ylim = c(0, 50), pch = 19, cex = 0.6,
main = title, xlab = "", ylab = "", font.main = 1)
grid()
}
predshape <- 1
p("Extreme gradient boosting")
points(extrap$x, predict(mod_xg, newdata = as.matrix(extrap)), col = "darkgreen", pch = predshape)
mod_lm <- lm(y ~ x)
p("Linear regression")
points(extrap$x, predict(mod_lm, newdata = extrap), col = "red", pch = predshape)