1

In Dozat 2016 they introduce a sequence of hyperparameters $\mu_0, \cdots, \mu_T$ where $T$ is the total number of iterations. Naturally $T$ is dependent on the convergence of the parameters, so it isn't a fixed sequence ahead of time.

Dozat 2016 doesn't give a specific recommendation, but says:

It often helps to gradually increase or decrease $\mu$ over time, so for the rest of this section we will assume a list of values for $\mu$ indexed by timestep $\mu_1, \cdots, \mu_T$ in order to aid clarity

What is a suitable rule for assigning $\mu_t$ for any given iteration $t$ of the training?

DifferentialPleiometry
  • 2,274
  • 1
  • 11
  • 27

0 Answers0