1

I want to tune my neural network and find the some good lambda and eta values. I can do exhaustive grid search to find the best combo. However Bergstra in this http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf proposes random search.

How am I supposed to do this?

Heres is how i think of it:

generate 100 random etas

eta=rand(1-100) 

do the same for lambdas

lambdas=rand(1-100) 

and then I have to sample random pairs i think. But how many samples should I take from the this pool of 100 elements per hyperparameter?

1 Answers1

1

Depending on the dataset the examine they need at most 32 or 64 trials; for their experiments they tried 256 (pp. 291, line #6).

I general this is a slightly "how long is a piece of string" question unless you use some particular pseudo-random search technique like Simulated Annealing (SA) or Bayesian Optimisation (BO) (ie. Gaussian Process Regression for Optimisation) for your search in the hyper-parameter space. Difference between SA and BO are briefly discussed here. The authors address this issue by explicitly using the latter (Gaussian Process Regression; see bottom paragraphs of pp. 291 & 292).

In the second half of their paper they also check on quasi-random sampling approaches (simply speaking: a random search that ensure some uniformity of the samples along the sampling space) and they find it is also better than grid-search too. The main reason that "random search is better than grid search" can be seen in the schematic of shown in Fig. 1; in short random search does not lose (a lot of) efficiency by to focusing on irrelevant features. It will allow a more thorough investigation of the feature space of the more relevant features while not losing information from the irrelevant ones.

usεr11852
  • 33,608
  • 2
  • 75
  • 117