How am I supposed to crossvalidate hyperparameters in a timely manner? XGBOOST

Question

I am trying right now to do a 10 fold cross validation cycling through some different combinations of hyper parameters in the XGBOOST algorithm. I know this is a very complicated process with many different variables in play. I am looking at max_depth between 5 and 10 and learning rates (eta) 0.01, 0.00082, 0.00064, 0.00046, 0.00028. This yields 30 different combinations of hyper parameters and it has to be run on all 10 folds of the cross validation, so 300 total times. XGBOOST usually takes hundreds of boosting rounds (n_estimators/nrounds) before the early stop will trigger. I have 2 computers that have been grinding away at this for 7 hours and it's only about half way finished. It takes forever!!!! After all this the model might not even perform well in the cross validation let alone on the testing data and I will have to do it all over again.

Surely there is a way to speed up this process aside from using a cloud computing cluster. If I jack up the learning rate it will be a little faster, but still very slow due to the large number of n_estimators. If I make n_estimators very low and use learning rates such as 0.1-0.3 it's not actually telling me if the appropriate hyperparameters are any good. All of this doesn't even factor in all the other hyperparameters XGBOOST has to offer. What techniques do you guys use to speed up this process? Or is there no alternative and it just takes forever no matter what?

A lot of the hyperparameters that make it run faster like subsample pct or tree depth can significantly effect performance. Use the histogram split algorithm for a speed boost without much accuracy hit with xgboost or just try out lightgbm which learns slightly differently but is significantly faster and on par / better accuracy wise. — Tylerr, Nov 02 '21 at 21:01
Some suggestions to solving the more general problem of global optimization, not particular to xgboost: https://stats.stackexchange.com/questions/193306/optimization-when-cost-function-slow-to-evaluate Some posts on tuning xgboost specifically: https://stats.stackexchange.com/search?q=%5Bxgboost%5D+tune+answers%3A1+score%3A3 — Sycorax, Nov 02 '21 at 21:02

How am I supposed to crossvalidate hyperparameters in a timely manner? XGBOOST

0 Answers0