5

In general terms my estimator is $\sum_{i=1}^n {\omega_i * x_i \over \sum _{i=1}^n {\omega_i}}$ where $x_i$ are realizations of the instrumental r.v., $w_i$ its corresponding importance weighs and $n$ the sample size.

What is the variance of this estimator?

Motmot
  • 85
  • 10
  • 3
    There is no closed-form solution for the variance, due to the self-normalising term, but only (asymptotic) approximations by the Delta method. – Xi'an Dec 11 '16 at 17:36
  • I don't know the Delta method. I have the vector x of n realization of the instrumental r.v., i have computed the vector wi (of dimension n) with the importance weights, I have estimated the parameter with that summation but I need an estimate of the variance/error of this estimate. – Motmot Dec 11 '16 at 17:55
  • 2
    To get a more accurate estimate than the Delta Method, use bootstrapping. As a bonus, that will give you an estimate of the entire distribution. However if you want to get an estimate of variance without (prior to) having data, then use the Delta Method. – Mark L. Stone Dec 12 '16 at 00:30

3 Answers3

5

This is some work showing the Delta Method for approximating the variance of a ratio.

Let $X_1, \ldots, X_n \overset{iid}{\sim} q()$ be samples from your normalized instrumental density $q(\cdot)$. Let $p(\cdot) = C^{-1}p_u(\cdot)$ be your target density. Assume you can only evaluate $p_u$. Call $w_i = w_i(x_i) = p_u(x_i)/q(x_i)$.

The Delta Method is justified with Taylor approximations. Call $A = \frac{1}{n}\sum_i w_i x_i$, $B=\frac{1}{n}\sum_j w_j$, the numerator and denominator of your expression. Also, call $\mu_A$ and $\mu_B$ their expected values. That's

$$ \sum_{i=1}^n {w_i * x_i \over \sum _{i=1}^n {w_i}} = \frac{A}{B}. $$

Delta Method takes the Taylor approximation,

$$ f(A,B) \approx f(\mu_A,\mu_B) + f_{A}(\mu_A,\mu_B)(A-\mu_A) + f_B(\mu_A,\mu_B)(B-\mu_B) $$

and takes the variance on both sides:

$$ \text{Var}\left[\frac{A}{B}\right] \approx [f_{A}(\mu_A,\mu_B)]^2\text{Var}[A] + [f_B(\mu_A,\mu_B)]^2\text{Var}[B] + 2f_{A}(\mu_A,\mu_B)f_B(\mu_A,\mu_B)\text{Cov}(A,B). $$ Or in your case:

\begin{align*} &\frac{1}{\mu_B^2}\frac{1}{n}E[(WX - \mu_A)^2] + \frac{\mu_A^2}{\mu_B^4}\frac{1}{n}E[(W - \mu_B)^2] - 2\frac{1}{\mu_B}\frac{\mu_A}{\mu_B^2}E[W^2X] + 2\frac{1}{\mu_B}\frac{\mu_A}{\mu_B^2}E[WX]E[W] \\ &= \frac{1}{\mu_B^2}\frac{1}{n}\left\{E[W^2X^2] + \frac{\mu_A^2}{\mu_B^2}E[W^2] - 2 \frac{\mu_A}{\mu_B}E[W^2X] \right\} \\ &= \frac{1}{\mu_B^2}\frac{1}{n}\left\{ E[(XW - \frac{\mu_A}{\mu_B}W)^2]\right\}\\ &= \frac{1}{\mu_B^2}\frac{1}{n}\left\{ E[W^2(X - \frac{\mu_A}{\mu_B})^2]\right\}, \end{align*} where we use an uppercase $W$ to denote any of the random unnormalized weights. If you plug in the sample estimates for all the above quantities you get

$$ \frac{1}{n}\frac{\frac{1}{n}\sum_i w_i^2(x_i - A/B)^2 }{B^2 } = \sum_{i=1}^n \left[\frac{w_i}{\sum_j w_j}\right]^2(x_i - A/B)^2. $$

I used this as a reference: http://statweb.stanford.edu/~owen/mc/Ch-var-is.pdf

Taylor
  • 18,278
  • 2
  • 31
  • 66
  • Thank you very much, the approximation works fine! Just a pair of question to understand it theoretically: 1) why fA(μa,μb) is (1/μb^2) ? 2) why E[WX−μA)^2] becomes E[W^2 X^2] despite μA isn't 0? – Motmot Dec 12 '16 at 13:22
  • $f_A$ is the partial derivative of $A/B$ with respect to $A$. So when you square that that's what you get. Then I use the fact that $V(X) = E(X^2) -(E(X))^2$. – Taylor Dec 12 '16 at 16:41
  • Ok thanks, now I have understood everything except the $1/n$. Where is it from? – Motmot Dec 12 '16 at 20:15
  • Ok, I have now understood also the 1/n. Thanks again! – Motmot Dec 13 '16 at 09:51
  • no problem @Motmot happy importance sampling – Taylor Dec 13 '16 at 14:33
  • Why can't we just naively take variance of this quantity directly, like we do for the regular IS variance? $$ \sum_{i=1}^n {w_i * x_i \over \sum _{i=1}^n {w_i}} = \frac{A}{B}. $$ From my understanding, $$\Var(\sum_{i=1}^n {w_i * x_i \over \sum _{i=1}^n {w_i}} = \frac{A}{B}) $$ will be a clunky integral, when we express it as $$E[X^2 ]- E[X]^2$$ but then, so is using the delta method – information_interchange Feb 12 '20 at 00:37
  • 1
    @information_interchange yeah, that's valid. But taking expectations of fractions is generally easier if you use approximations – Taylor Feb 12 '20 at 00:41
  • Thanks! But I think _in general_ working with a good approximation is easier :> – information_interchange Feb 12 '20 at 00:47
  • Actually, now I see that we cannot directly take the variance of a ratio of random variables: https://stats.stackexchange.com/questions/449258/expectation-and-variance-notation-for-ratio-of-random-variables – information_interchange Feb 13 '20 at 03:25
2

A very crude approximation to the variance of the self-normalised importance sampling estimator $$ \hat{\mu}_n = \sum_{i=1}^n {\omega_i x_i \over \sum _{i=1}^n {\omega_i}} $$ is $$ \textrm{Var}_q(\hat{\mu}_n) \approx \textrm{Var}_p(\hat{\mu}_n) (1 + \textrm{Var}_q(W)). $$ where $p$ is the target distribution and $q$ the importance distribution.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
1

To get a more accurate estimate than the Delta Method (shown in answer by @Taylor), use bootstrapping https://en.wikipedia.org/wiki/Bootstrapping_(statistics) and https://www.crcpress.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317. As a bonus, that will give you an estimate of the entire distribution.

However if you want to get an estimate of variance without (prior to) having any data, then use the Delta Method.

Mark L. Stone
  • 12,546
  • 1
  • 31
  • 51