Hoeffding's inequality states that if, $X_1, \ldots, X_n$ are independent random variables bounded by the interval $[0, 1]$, i.e. $0 ≤ X_i ≤ 1$. And the empirical mean of these variables is given by,
$$\overline X = \frac{1}{n}(X_1 + \cdots + X_n).$$
Then, $$ Pr \left({\overline {X}}-\mathrm {E} \left[{\overline {X}}\right]\geq \Delta\right)\leq e^{-2n \Delta ^{2}}, \quad \Delta \geq 0.$$
In the setting I am interested in the $X_i$'s are i.i.d.
The Problem:
If instead of the knowledge that the mean $\overline X$ is composed of $n$ i.i.d random variables, we only had an expected number of samples. That is if $n$ where a random variable with known expectation $\mathbb{E} [n]$ then, can come up with a Hoeffding like bound that uses $\mathbb{E} [n]$ instead of $n$?
Two approaches to a problem where the true value of $n$ is not available come to mind-
- If the true number of samples $n$ are not available, but if a range of values of $n$ are available $ n_1 \leq n \leq n_2$ then we can do a union bound over them,
$$ Pr \left({\overline {X}}-\mathrm {E} \left[{\overline {X}}\right]\geq \Delta\right)\leq \sum_{n = n_1}^{n_2} e^{-2n \Delta ^{2}}, \quad \Delta \geq 0.$$
However, such a range is not available here.
- If the distribution of $ n $ were accessible (it is not here) then we could first invoke a concentration result on $n$ and then get the final result, however such access is not present here.
Any inputs on approaches when $n$ is itself random are appreciated.