In Dozat 2016 they introduce a sequence of hyperparameters $\mu_0, \cdots, \mu_T$ where $T$ is the total number of iterations. Naturally $T$ is dependent on the convergence of the parameters, so it isn't a fixed sequence ahead of time.
Dozat 2016 doesn't give a specific recommendation, but says:
It often helps to gradually increase or decrease $\mu$ over time, so for the rest of this section we will assume a list of values for $\mu$ indexed by timestep $\mu_1, \cdots, \mu_T$ in order to aid clarity
What is a suitable rule for assigning $\mu_t$ for any given iteration $t$ of the training?