I am running a Hyperopt search over a LightGBM regressor on a large dataset to tune max_depth, min_child_samples, reg_lambda and learning_rate, with n_estimators static at 10000 and num_leaves dynamically set to (2^max_depth)-1. It works, but it is sloooowwww.
I wonder if it would make sense to run it with lower n_estimators, say 100, then set the final model to have n_estimators 10000 and reduce the learning_rate in proportion, leaving all other parameters as tuned? This would be much faster and hopefully almost as accurate as tuning with 10k estimators.
A few quick tests have shown that if there is a direct relationship between the two in this way, then it is not exactly inverse (i.e. n_estimators*x and learning_rate/x doesn't score very well), and I haven't been able to find a formula for the relationship. Is there one, or am I misunderstanding something?
I could run a second search on learning_rate only, but that would have the same initial problem of running slowly with high n_estimators...