How should I set $\vec{\mu}$ in NAdam optimization?

Asked Nov 12 '21 at 17:42

Active Nov 12 '21 at 18:34

Viewed 22 times

In Dozat 2016 they introduce a sequence of hyperparameters $\mu_0, \cdots, \mu_T$ where $T$ is the total number of iterations. Naturally $T$ is dependent on the convergence of the parameters, so it isn't a fixed sequence ahead of time.

Dozat 2016 doesn't give a specific recommendation, but says:

It often helps to gradually increase or decrease $\mu$ over time, so for the rest of this section we will assume a list of values for $\mu$ indexed by timestep $\mu_1, \cdots, \mu_T$ in order to aid clarity

What is a suitable rule for assigning $\mu_t$ for any given iteration $t$ of the training?

edited Nov 12 '21 at 18:34

asked Nov 12 '21 at 17:42

DifferentialPleiometry

2,274
1
11
27

How should I set $\vec{\mu}$ in NAdam optimization?

0 Answers0