2

From my experience and also by reading papers I noticed that in the Machine Learning / applied statistics world, when tuning hyperparameters there are basically two approaches: (1) some simple search algorithms (e.g. Grid/Random Search) or (2) some complex methods (CMA-ES, TPE, Bayesian Optimization).

However, there are some methods that lie in the middle and belong to the sampling family such as Latin-Hypercube (n-Rooks), Jittered, Blue Noise, Multi-Jittered, and Quasi Monte Carlo (Halton, Sobol or similar Sequences).

Those methods are heavily used in other science fields such as computer graphics but not so much in ML. This can be seen for example in the scikit-learn Python package where there is no such sampling methods whereas GridSearch is present.

My question is then, are there reasons to not used them? Is there any survey/comparison piece of literature where this issue is addressed?

  • 1
    I don't know about the rest, but Latin hypercubes arise as a special case of pure random search, or are frequently used as a way to initialize Bayesian optimization. But I agree with the overall premise -- the methods you enumerate are not often represented in ML research; if you can demonstrate that these methods fill a niche or add value in a way not addressed by other optimization methods, you'd have a nice paper on your hands. While LIPO doesn't appear in your list, it's straightforward and has seen some usage in tuning: https://stats.stackexchange.com/questions/193306/ – Sycorax Feb 01 '21 at 02:24
  • Thanks @Sycorax ! Actually I had the idea of preparing a paper covering this topic but I wanted to make sure I am not reinventing the wheel. I've never heard of LIPO before so it is nice to have one extra option! The basic idea is to use progressive samplers because they add the nice advantage of "adding" points while keeping all the evaluated ones. Kind of Bayesian update on Posterior but with Samplers – Ezequiel Castaño Feb 01 '21 at 03:48

0 Answers0