If $X\sim \operatorname{lognormal}$ then $Y:=(X-d\mid x\geq d)$ has approximately a Generalized Pareto distribution

Question

Let $X$ be a random variable with lognormal distribution. Show that when sufficiently large then $Y:=(X-d\mid x\geq d)$ is approximately a random variable with generalized Pareto distribution.

Hint: Use the fact that $\operatorname{erf}(x)\approx 1-\frac{1}{\sqrt{x}}e^{-\frac{x^2}{2}}$ for large values of $x$.

My attempt: We recall that the density function for the lognormal distribution is given by $$ f(x)=\frac{1}{x\sigma\sqrt{2\pi}}e^{\frac{-(\log x-\mu)^2}{\sigma}}\:\:\text{ for }x>0. $$ The comulative distribution function for a generalized Pareto random variable is given by $$ G(x)=1-\left(1+\frac{\gamma x}{\theta}\right)^{\frac{-1}{\gamma}}. $$ The objective is to find parameters $\gamma$ and $\theta$ such that $\mathbb{P}(Y\leq y)\approx G(y)$, it is clear that $\gamma$ and $\theta$ will be expressed in terms of $\sigma$, $\mu$ and $d$. My attempt is: \begin{align} \mathbb{P}(Y\leq y) & =1- \frac{1-\int_0^{d+y}\frac{1}{x\sigma\sqrt{2\pi}} e^{\frac{-(\log x-\mu)^2}{\sigma}} \, dy }{1-\int_0^d\frac{1}{x\sigma\sqrt{2\pi}}e^{\frac{-(\log x-\mu)^2}{\sigma}} \, dy} \end{align}

We consider the changge of variable given by $t=\frac{\log x -\mu}{\sqrt{2}\sigma}$, then $dt=\frac{1}{\sqrt{2}\sigma x} \, dx$, so, $dx=\sqrt{2}\sigma x\, dt$. Therefores, we have

\begin{align} \mathbb{P}(Y\leq y) & =1- \frac{1-\frac{1}{\sqrt{\pi}}\int_{-\infty}^{\frac{\log(d+y) -\mu}{\sqrt{2}\sigma}}e^{-t^{2}} \, dy }{1-\frac{1}{\sqrt{\pi}}\int_{-\infty}^{\frac{\log(d) -\mu}{\sqrt{2}\sigma}}e^{-t^{2}} \, dy} \\ &= 1- \frac{1-\frac{1}{\sqrt{\pi}}\int_{-\infty}^0 e^{-t^2} dy- \frac{1}{\sqrt{\pi}}\int_0^{\frac{\log(d+y) -\mu}{\sqrt{2}\sigma}}e^{-t^2} \, dy }{1-\frac{1}{\sqrt{\pi}}\int_{-\infty}^0 e^{-t^2} \, dy-\frac{1}{\sqrt{\pi}}\int_0^{\frac{\log(d) -\mu}{\sqrt{2}\sigma}}e^{-t^2} \, dy} \\ &= 1- \frac{1-\frac{1}{2}- \frac{1}{\sqrt{\pi}}\int_0^{\frac{\log(d+y) -\mu}{\sqrt{2}\sigma}} e^{-t^{2}} \, dy }{1-\frac{1}{2}-\frac{1}{\sqrt{\pi}} \int_0^{\frac{\log(d) -\mu}{\sqrt{2}\sigma}}e^{-t^2} \, dy} \\ &= 1- \frac{\frac{1}{2}- \frac{1}{2} \operatorname{erf} \left(\frac{\log(d+y) -\mu}{\sqrt{2}\sigma}\right) }{\frac{1}{2}- \frac{1}{2} \operatorname{erf}\left(\frac{\log(d) -\mu}{\sqrt{2}\sigma}\right) } \\ &= 1- \frac{1- \operatorname{erf}\left(\frac{\log(d+y) -\mu}{\sqrt{2}\sigma}\right) }{1- \operatorname{erf} \left(\frac{\log(d) -\mu}{\sqrt{2}\sigma}\right) } \\ &\approx 1- \frac{\frac{1}{\sqrt{\frac{\log(d+y) -\mu}{\sqrt{2}\sigma}}}e^{-\frac{(\log(d+y)-\mu)^2}{4\sigma^2}} }{\frac{1}{\sqrt{\frac{\log(d) -\mu}{\sqrt{2}\sigma}}}e^{-\frac{(\log(d)-\mu)^2}{4\sigma^2}} } \leftarrow \text{by hint.}\\ &= 1- \sqrt{\frac{\log(d)-\mu}{\log(d+y)-\mu}}e^{-\frac{ (\log(d+y)-\mu)^2}{4\sigma^2}+\frac{(\log(d)-\mu)^2}{4\sigma^2}}\\ &=1- \sqrt{\frac{\log(d)-\mu}{\log(d+y)-\mu}}e^{\frac{1}{4\sigma^2}(\log(d+y)-\log(d))(\log(y+d)+\log(d)-\mu) }\\ &= 1- \sqrt{\frac{\log(d)-\mu}{\log(d+y)-\mu}}e^{-\frac{(\log(d+y)-\mu)^2}{4\sigma^2}+\frac{(\log(d)-\mu)^2}{4\sigma^2}}\\ &=1- \sqrt{\frac{\log(d)-\mu}{\log(d+y)-\mu}}e^{\frac{1}{4\sigma^2}\log\left(\frac{d+y}{d}\right)\left(\log(dy+d^2)-\mu\right) }\\ \end{align} I do not know how to continue, algebraically I have not been able calibrate the parameters to get what I need.

I ask for your help with this problem, any solution or suggestion will be well received.

The statement is not true: it is never the case, for any $d,$ that a lognormal distribution truncated at $d$ has a generalized Pareto distribution. Does the exercise perhaps ask you to show that the distribution is *approximately* generalized Pareto? — whuber, May 18 '19 at 03:02
If the "Generalized Pareto" distribution is the one [described by Wikipedia](https://en.wikipedia.org/wiki/Generalized_Pareto_distribution), then no approximation of this sort will hold. This is a consequence of considerations of tail behavior like those discussed at https://stats.stackexchange.com/questions/86429: all Generalized Pareto distributions have heavier tails than all lognormal distributions. Could you explain the sense in which one of these distributions is intended to approximate the other? — whuber, May 18 '19 at 16:56
@whuber I refer to it in the sense of Pickands-Balkema-de Haan Theorem, see https://en.wikipedia.org/wiki/Pickands%E2%80%93Balkema%E2%80%93de_Haan_theorem This theorem gives the existence of the constants that determine the Generalized Pareto Distribution, but does not give an explicit form. — Diego Fonseca, May 20 '19 at 19:13
That would be convergence in distribution, allowing for the scale parameter of the Generalized Pareto to vary with the cutoff $d.$ Because the result is not true, may I inquire about the origin of this statement? — whuber, May 20 '19 at 19:28
Also asked at https://math.stackexchange.com/questions/3230195/if-x-sim-mathrmlognormal-then-y-x-dx-geq-d-has-approximately-a-genera. — StubbornAtom, May 20 '19 at 20:24
@whuber, while your reasoning is correct that the tail of a (truncated) lognormal distribution has different asymptotic behavior than the tail of a generalized Pareto variable and thus no such approximation would hold, that is not the question posed here: it is truncated *and* conditioned. The conditioning has the effect of cancelling out the exponential tail, pulling out the polynomial correction and allowing the approximation to match the power-law tail of the Pareto. See my solution posted at the math SE link above. — pre-kidney, May 26 '19 at 07:00
Elaborating a little more on the last comment, the idea is that after changing variables, the problem amounts to understanding the scaling behavior of the tail of a normal random variable when viewed in a window $[n,n+xe^{-n}]$ for $x$ fixed and $n\to\infty$. When $n$ is large the density is roughly constant on such a small interval, but we can tease out the Pareto power-law behavior by being careful in our estimations. — pre-kidney, May 26 '19 at 11:33
@pre Thank you for sharing your insight. You seem, however, to have demonstrated the distribution is *not* Pareto, because the exponent $\gamma$ itself varies with $d.$ In the limit as $d\to\infty,$ $\gamma$ vanishes and $\theta$ becomes infinite! — whuber, May 27 '19 at 11:02
@whuber Yes, the question (at least the form I saw on math.SE initially) was asking to show that for large $d$, it is well-approximated by a Pareto (with some parameters depending on $d$). As you pointed out, this does not mean that the limit is a Pareto distribution, but rather that the limit is a *limit* of Pareto distributions. (This is the idea behind a "scaling limit", for example how random walk approaches Brownian motion - you have to scale a bit to get something non-trivial, same is the case here.) — pre-kidney, May 27 '19 at 21:15
@Pre Thank you--I have suspected from the outset that this might be what is really intended by the question anyway. — whuber, May 27 '19 at 21:45

pre-kidney · Accepted Answer · 2019-05-26T11:30:38.890

I have posted a solution over at math.SE https://math.stackexchange.com/questions/3230195/if-x-sim-mathrmlognormal-then-y-x-dx-geq-d-has-approximately-a-genera/3240024#3240024

When $X=e^{\mu+\sigma Z}$ with $Z$ standard normal, I obtain that $Y=(X-d\mid X\geq d)$ is approximately generalized Pareto in the limit as $d\to\infty$, with $$ \theta=\frac{d\sigma^2}{\log d-\mu},\qquad \gamma=\frac{\sigma^2}{\log d-\mu}. $$

If $X\sim \operatorname{lognormal}$ then $Y:=(X-d\mid x\geq d)$ has approximately a Generalized Pareto distribution

1 Answers1