0

I'm reading this paper on safe model-based reinforcement learning.

Assumption 2 in this paper states:

Let $\mu_n(\cdot)$ and $\Sigma_n(\cdot)$ denote the posterior mean and covariance matrix functions of the statistical model of the dynamics (1) conditioned on $n$ noisy measurements. With $\sigma_n(\cdot) = \mathrm{trace}(\Sigma_n^{1/2}(\cdot))$, there exists a $\beta_n > 0$ such that with probability at least $(1-\delta)$ it holds for all $n \ge 0, x \in \mathbb{X}$ and $u \in \mathbb{U}$ that $ \lVert f(x,u) - \mu(x,u) \rVert_2 \le \beta_n \sigma_n(x,U)$

$f(x,u)$ is the model of the dynamics for a given state $x$ and action $u$, with some uncertainty $\Sigma_n(x,u)$. $\mu(x,y)$ would therefore represent the true dynamics, where $f(x,u)$ is an estimate.

They then describe the assumption by saying:

This assumption ensures that we can build a confidence interval on the dynamics that, when scaled by an appropriate constant $\beta_n$, cover the true function with high probability.

Near the end of the paper, when talking about implementing their algorithm, they state:

To enable more data-efficient learning, we fix $\beta_n = 2$. This corresponds to a high-probability decrease condition per-state, rather than jointly over the state space.

The only thing I don't understand about this assumption is why we have an arbitrary constant $\beta_n$ that bounds the confidence interval? Isn't the statement $\lVert a - b \rVert \le \beta_n \sigma $ true for any $a \in \mathbb{R}$, $b \in \mathbb{R}$ and $\sigma \in \mathbb{R}$ since we can just say that $\beta_n \rightarrow \infty$, therefore $\lVert a - b \rVert$ is unbounded. I.e., we could just make $\beta_n$ arbitrarily large until this condition is true.

Is this a standard assumption? What is the purpose of adding $\beta_n$ instead of just bounding it by $\sigma(x,u)$?

  • 1
    Unless one reads the paper, this question is cryptic. How does the existence of $\beta_n$ imply $\beta_n\to\infty$ and why would this even be relevant in any particular instance? How is the distance between $x$ and $y$ germane to the quotation? Indeed, what is $y$? – whuber Feb 05 '22 at 16:23
  • I chose $x$ and $y$ just as arbitrary variables. I've updated the question for clarification. The assumptions says "there exists $\beta_n > 0$", so that implies that $\beta_n$ /could/ approach infinity. My question is why would one use this scaling factor when we could just make $\beta_n$ arbitrarily large. i.e., why is it useful. – alexanderd5398 Feb 05 '22 at 16:54

0 Answers0