4

I am considering the integral $$ I(y \mid \mu, \sigma) = \int_y^1 \frac{\exp \left\{ \frac{-1}{2\sigma^2}[\textrm{logit}(x)-\mu)]^2 \right\}}{\sigma \sqrt{2\pi} (1-x)}\textrm{d}x$$ which for $y=0$ is the expectation of a random variable $X$ such that $X \sim \textrm{logit-Normal}(\mu, \sigma^2)$. Unfortunately, the integral is intractable so numerical methods are required. This integral is a step I need to take as part of an optimisation routine so I require a deterministic and fast estimate of $I(y)$. For unfortunate choices of $\mu$, $\sigma$ this integral can be difficult to calculate numerically since the density imight infinite at $0$ or $1$.

I know that a quasi Monte Carlo (QMC) estimate of $I(0)$ is $$I(0) = E(X) \approx \frac{1}{K-1}\sum_{i=1}^{K-1} G \left(\Phi^{-1}_{\mu, \sigma^2}(i/K)\right)$$ with $G(x) = 1/(1 + e^{-x})$ and $\Phi^{-1}_{\mu, \sigma^2}(\cdot)$ is the inverse of a normal cdf with mean $\mu$ and variance $\sigma^2$. This estimate is determinsitic and fast! However I'm struggling to generalise to $I(y)$ with $0<y<1$

I have tried numerical integration in $\verb|R|$ using $\verb|integrate()|$ and for the most part is great, but can be unstable for unfortunate choices of $\mu$, $\sigma$, hence QMC.

I have also tried starting the sum at the first value of $i$ such that $i>Ky$ but this gives a really bad estimate (I adjusted the denominator to account for the 'length' of the sum changing).

For instance, a regular Monte Carlo estimate of $I(0.8;1,1) \approx 0.6397$ whereas using a QMC estimate with the sum starting at the first value of $i$ such that $i>Ky$ is $I(0.8 \mid 1, 1) \approx 0.9106$ which is clearly way off. I verified by MC estimate is valid by comparing to a vanilla numerical integration. Comparing regular MC estimates of $I(0)$ to QMC estimates of $I(0)$ also give good agreement; there's something I don't understand about this 'truncation'.

Failing this, are there alternate, fast methods of computing $I(y|\mu, \sigma)$?

There's some $\verb|R|$ code below which outlines the two computations I illustrated

## regular MC
X <- rnorm(10000,1,1)
X <- mean((expit(X))*(expit(X)>1))
> X
[1] 0.6397533

## QMC
Z <- ((1:(10000-1))/10000)
Z <- Z[Z > 0.8]
mean(expit(qnorm( Z ,1,1)))
[1] 0.9106322


kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
jcken
  • 2,092
  • 5
  • 17
  • I don't understand the first part of your program. expit(x)=1/(1+exp(-x)). Therefore, (expit(X)>1) is always equal to 0.and the mean of that expression is always 0. – John L Feb 12 '21 at 18:18
  • Yes thats a typo; should be expit(x)>y – jcken Feb 12 '21 at 19:25

1 Answers1

7

The way I understand the integral, it writes as \begin{align*} I(y \mid \mu, \sigma) &= \int_y^1 \frac{\exp \left\{ \frac{-1}{2\sigma^2}[\textrm{logit}(x)-\mu]^2 \right\}}{\sigma \sqrt{2\pi} (1-x)}\textrm{d}x\\ &= \int_y^1 \frac{\varphi_{\mu,\sigma^2}(\textrm{logit}(x))}{(1-x)}\textrm{d}x\\ &\stackrel{z(x)=\textrm{logit}(x)}{=} \int_y^1 \varphi_{\mu,\sigma^2}(z(x)) x\frac{\textrm{d}z(x)}{\textrm{d}x}^\text{(i)}\textrm{d}x\\ &= \int_{\textrm{logit}(y)}^\infty G(z)^\text{(ii)} \varphi_{\mu,\sigma^2}(z)\textrm{d}z\\ \end{align*} since

(i). the logit function $z(x)=\textrm{logit}(x)=\log\{x/(1-x)\}$ has derivative$$\frac{\textrm{d}z(x)}{\textrm{d}x}=\frac{1}{x(1-x)^2}$$ (ii) the inverse of the logit function is $x(z)=e^x/(1+e^x)=1/(1+e^{-x})=G(z)$

Hence it is a Gaussian expectation and a deterministic approximation of the integral should be $$\frac{1}{K-1}\sum_{i=1}^{K-1} G \left(\Phi^{-1}_{\mu, \sigma^2}(i/K)\right) \times\mathbb I_{\underbrace{\textrm{logit}(y)\le\Phi^{-1}_{\mu, \sigma^2}[i/K)]}_{\Phi_{\mu, \sigma^2}[\textrm{logit}(y)]\le i/K}}$$ I would however not call it a quasi-Monte Carlo approximation, but rather a Riemann sum approximation.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • 1
    I think this looks right and thought I'd already tried it! Perhaps a coding bug. Tried numerically and your estimate appears to agree well with MC estimates for a range of $\mu$, $\sigma$ - thanks! – jcken Feb 12 '21 at 18:11