Let $X$ be a random vector (high dimensional) and $X_1,\dots, X_{n_1}$ its i.i.d. observations. Let $Y$ be another random vector of same dimension and $Y_1,\dots, Y_{n_2}$ -- its i.i.d. observations. They are observed in simulation and while it is possible to simulate more, it is also more time consuming.
I need to measure the following quantity:
$$b = \| \mathbb{E}[X]-\mathbb{E}[Y]\|^2, $$
for which I have figured out an unbiased estimator can be constructed as
$$\hat b = \| \hat \mu_1 - \hat\mu_2\|^2 - \frac{V_1}{n_1} - \frac{V_2}{n_2},$$
where $\hat \mu_1$ and $V_1$ are the sample mean and total sample variance of $(X_i)_i$ and $\hat \mu_2$ and $V_2$ are likewise for $(Y_i)_i$. I can detail this if needed.
I am then interested in computing confidence intervals for this estimator. I first tried analytic approach using that $\hat \mu_1$, $V_1$, $\hat \mu_2$ and $V_2$ are independent and assuming their normality (CLT for the distribution of $\hat \mu_k$ and variance of sample variance https://math.stackexchange.com/a/73080/446952 for $V_k$). Also I cannot exclude an error, the confidence intervals were far too tight, by large disagreeing with repeated simulations.
Now I am trying the bootstrap method. However, the samples are vectors and I cannot store too many samples. I figured out I could incrementally compute all the needed quantities for the estimate $\hat b$: the means and variances of bootstrap re-samples on the go. This way I am bounded in memory by the number of bootstrap re-samples but not by the number of samples $n_1$, $n_2$.
The problem is that the bootstrap intervals appear to have a bias: they are systematically above the unbiased estimate $\hat b$ (and also the mean of the bootstrap samples is systematically above $\hat b$).
Is it possible to construct confidence intervals that are unbiased for a finite sample and bootstrap size? Or perhaps by other techniques? As a hacky solution I just forcibly align my CIs in order to compensate the difference between the bootstrap distribution mean and the unbiased estimate $\hat b$. But this seems rather trustless. Can you advice how to improve, e.g. by the right correction, a clever efficient computation with more samples, or using other methods? A recommendation of what to read about bootstrap and its guarantees would be helpful too.