So me and my friend decided to learn about tuning hyper parameter of xgboost using Grid Search CV for a regression problem. We decided to tune parameter such as maximal depth, column sampling and so on. After the process of grid searching, we found our ideal value of each parameter after checking each value that we decided to check.
After practicing grid searching, my friend decided to look a little deeper. He decided to retry its grid search with a twist of instead of trying multiple hyper parameter at once, he grid search only one hyper parameter.
He noticed that on every hyper parameter, as he increases the value, the loss would decrease until at point that the loss will keep increasing as you increase the value of the hyper parameter.
For example, as he increases the maximal depth, the loss decreases (possibly because as we go deeper, we reduce the level of under fitting) but if he goes further, then the loss increases again (possibly now due to over fitting). Another example is L1 regularization (lasso). If he increases the L1 a little bit, the loss decreases (possibly because the L1 zeroes out irrelevant feature), but if he increases the L1 a little more, the loss increases again (possibly because L1 is too strong that now it also zeroes out relevant feature.)
So my friend said that the loss function progression actually resembles unimodal function, so he later said "If the loss function progression is unimodal, why not we just ternary search it?"
The idea is that we are able to reduce the complexity of the hyper parameter tuning. If there are N features and for each feature we want to check M candidate value, the complexity of a CV Grid Search would be $N^M$. If we do ternary search, then the complexity would be $Nlog(P)$ where P is the size of the difference of the upper bound and lower bound.
I speculated that his algorithm would suffer because you would only be able to try one hyper parameter at a time, but he responded that we can use our knowledge to prioritize which hyper parameter to try first so it would not suffer too badly. For example, we know we should prioritize maximal depth first before the regularization because it has bigger impact to the model.
Is my friend's idea is actually workable?