8

$\newcommand{\E}{\mathbb{E}}$I'm reading a book on machine learning and sampling methods and I want to know why the estimator of the normalizing constant is unbiased, but the estimator of $\E\left[f(x)\right]$ is biased. Please see the image below:

enter image description here

My question is, how we can prove that importance sampling,

  1. leads to an unbiased estimator of the normalizing constant or $\E\left[\hat{Z}\right] = Z $;
  2. and the estimator of $\E\left[f(x)\right]$ for a function $f(.)$ is a biased.

Edit:

Suppose $Z$ is the normalizing constant of the desired distribution $p(x) = \phi(x)/Z$ ($p(x)$ is only known up to the normalizing constant $Z$). We have

$$I = \E\left[{\bf{f}}({\bf{x}})\right] = \frac{1}{Z}\int {{\bf{f}}({\bf{x}})\varphi ({\bf{x}})} d{\bf{x}} = \frac{1}{Z}\int {{\bf{f}}({\bf{x}})\frac{{\varphi ({\bf{x}})}}{{q({\bf{x}})}}q({\bf{x}})} d{\bf{x}}$$

and its approximation

$$I_N = \frac{{\frac{1}{N}\sum\nolimits_{i = 1}^N {{\bf{f}}({{\bf{x}}^i})w({{\bf{x}}^i})} }}{{\frac{1}{N}\sum\nolimits_{j = 1}^N {w({{\bf{x}}^j})} }} = \sum\nolimits_{i = 1}^N {{\bf{f}}({{\bf{x}}^i})W({{\bf{x}}^i})} $$

as well as

$$\hat Z = \frac{1}{N}\sum\nolimits_{i = 1}^N {w({{\bf{x}}^i})} $$

My questions are:

1- Why the author takes $Z = \int {\varphi ({\bf{x}})} d{\bf{x}}$?

2- I'm not able to prove mathematically that $\E\left[\hat{Z}\right] =Z $ ,

3- and I want to know how one can prove that $\hat{I_N}$ is biased for finite values of N?

Chill2Macht
  • 5,639
  • 4
  • 25
  • 51
sci9
  • 337
  • 3
  • 13
  • Please type your questions instead of posting images. They can't be read by the visually impaired using screen readers. – gung - Reinstate Monica Nov 06 '16 at 13:15
  • Possible duplicate of [Optimal importance sampling with ratio estimator](http://stats.stackexchange.com/questions/19456/optimal-importance-sampling-with-ratio-estimator) – Xi'an Nov 06 '16 at 13:15
  • Check also my [answer there](http://stats.stackexchange.com/a/210196/7224) for additional solutions. – Xi'an Nov 06 '16 at 13:19
  • 1
    The answer to your second first question is clear. $\int p(x)dx = 1$ since it's a distribution fn. So it follow directly since $ \phi(x) = Z p(x) $ I don't think it's an answer (no details) but second second and third questions are both versions of the basic fact of MC theory or sampling theory, that is if one picks iid points $x_i$ the sums converge to the integral. – meh Nov 06 '16 at 14:14
  • Could you please indicate the source of the reference? – Xi'an Nov 06 '16 at 14:42
  • Please see: http://store.elsevier.com/product.jsp?isbn=9780128017227 – sci9 Nov 06 '16 at 15:03

1 Answers1

6
  1. Why the author takes $\mathfrak{Z}=∫φ(x)dx$?

Since $p$ is a density, its integral is equal to $1$. If $\mathfrak{Z}$ is the normalising constant of $\varphi$, it has to satisfy $$\int p(x)\text{d}x=\int \frac{\varphi(x)}{\mathfrak{Z}}\text{d}x=1$$

2- I'm not able to prove mathematically that why $\mathbb{E}[\hat{\mathfrak{Z}}]=\mathfrak{Z}$

Recall that $w(x)=\varphi(x)/q(x)$. Then $$\mathbb{E}[w(X)]=\int \frac{\varphi(x)}{q(x)}q(x)\text{d}x= \int \varphi(x)\text{d}x=\mathfrak{Z}$$

3- and I want to know how one can prove that $\hat{I_N}$ is biased for finite values of N?

The ratio of two unbiased estimators is biased since $$\mathbb{E}[1/h(X)]\ge1/\mathbb{E}[h(X)]$$by Jensen's inequality.

Note: There exist unbiased estimators of the inverse $\mathfrak{Z}^{-1}$, including the notorious harmonic mean estimator.

Xi'an
  • 90,397
  • 9
  • 157
  • 575