I'm reading this paper on safe model-based reinforcement learning.
Assumption 2 in this paper states:
Let $\mu_n(\cdot)$ and $\Sigma_n(\cdot)$ denote the posterior mean and covariance matrix functions of the statistical model of the dynamics (1) conditioned on $n$ noisy measurements. With $\sigma_n(\cdot) = \mathrm{trace}(\Sigma_n^{1/2}(\cdot))$, there exists a $\beta_n > 0$ such that with probability at least $(1-\delta)$ it holds for all $n \ge 0, x \in \mathbb{X}$ and $u \in \mathbb{U}$ that $ \lVert f(x,u) - \mu(x,u) \rVert_2 \le \beta_n \sigma_n(x,U)$
$f(x,u)$ is the model of the dynamics for a given state $x$ and action $u$, with some uncertainty $\Sigma_n(x,u)$. $\mu(x,y)$ would therefore represent the true dynamics, where $f(x,u)$ is an estimate.
They then describe the assumption by saying:
This assumption ensures that we can build a confidence interval on the dynamics that, when scaled by an appropriate constant $\beta_n$, cover the true function with high probability.
Near the end of the paper, when talking about implementing their algorithm, they state:
To enable more data-efficient learning, we fix $\beta_n = 2$. This corresponds to a high-probability decrease condition per-state, rather than jointly over the state space.
The only thing I don't understand about this assumption is why we have an arbitrary constant $\beta_n$ that bounds the confidence interval? Isn't the statement $\lVert a - b \rVert \le \beta_n \sigma $ true for any $a \in \mathbb{R}$, $b \in \mathbb{R}$ and $\sigma \in \mathbb{R}$ since we can just say that $\beta_n \rightarrow \infty$, therefore $\lVert a - b \rVert$ is unbounded. I.e., we could just make $\beta_n$ arbitrarily large until this condition is true.
Is this a standard assumption? What is the purpose of adding $\beta_n$ instead of just bounding it by $\sigma(x,u)$?