Using extreme value theory to estimate bounds

Question

Suppose I have I have a random variable $X$ that I know is doubly bounded on support $[0,\theta]$ but I dont know $\theta$ (we don't know anything on the distribution of $X$, but assume it is not crazy so usual regularity conditions apply when you need them). Consider $Y:=X+Z$, where $Z$ is standard normal ($X$ and $Z$ are independent). And suppose I draw $Y_1,Y_2,...$ which are i.i.d. readings of $Y$.

Using the answer posted (actually the appendum to that answer) for this (extreme value theory: show normal to gumbel), and assuming I did my computations right, I showed that $Y_{(n)}$ when suitably normalized with the $(a_n,b_n)$ given in the appendum of the solution, converges to the Gumbel as well.

My question is: Is there a way to use this process to provide some estimates on $\theta$? I feel that before normalizing $Y_{(n)}$, it is possible that we have a shifted, scaled version of Gumbel with the shift reflecting the value of $\theta$ somehow (intuitively, as $n$ gets larger, the index $j$ for which $Y_j = Y_{(n)}$ is more and more likely to also be such that $X_j = X_{(n)}$ which is getting closer and closer to $\theta$, so for gigantic $n$, perhaps $Y_{(n)}$ is somewhat like $Z_{(n)} + \theta$?) I can't seem to mathematically justify this or find a procedure to extract some estimates of $\theta$. Any suggestions?

Each realization of $Y$ is determined by the joint realization of $\{Z, X\}$. For gigantic $n$, we expect that $X_{(n)}$ will be close to $\theta$ -and also, that $Z_{(n)}$ will be "very large" (since Z_{(n)} tends to infinity)... _but not necessarily on the same pair of joint realization_ of the two. So, since $Z_{(n)}$ will tend to dominate the determination of $Y_{(n)}$, it seems more realistic to state that $Y_{(n)} \approx Z_{(n)}+X$... If one defined $Y = \max\{X\}+\max \{Z\}$, then the relation $Y_{(n)} \approx Z_{(n)}+\theta$ could be expected... I presume $X$ and $Z$ are independent? — Alecos Papadopoulos, Jul 14 '14 at 15:14

score 1 · Accepted Answer · answered Jul 17 '14 at 02:31

Re-iterating my comment, since $Z_{(n)}$ and $X_{(n)}$ won't necessarily appear in the same pair of realizations, we cannot say that $Y_{(n)}$ will approximate their sum. But a moment's reflection shows that, for any finite $n$

$$Y_{(n)} \in [Z_{(n)},\;Z_{(n)}+\theta]$$

The upper bound is obvious. The lower bound stems from the fact that if given the pair of realizations $\{Z_{(n)}, X_{Z_{(n)}}\}$, there exists another pair $\{Z_i,X_i\}$ such that $Z_i + X_i > Z_{(n)}+X_{Z_{(n)}}$, it will fall inside the above interval, while, if no such pair exists, then the smallest possible value for $Y_{(n)}$ will come from the pair $\{Z_{(n)}, 0\}$ since $X$ is bounded from below at $0$.

Adopting now a naive approach, we could approximate the expected value of $Y_{(n)}$ by the mid-range, and in expectational terms since $Z_{(n)}$ is random,

$$E[Y_{(n)}] \approx \frac {E[Z_{(n)}]+ \left(E[Z_{(n)}]+\theta\right)}{2} = E[Z_{(n)}]+\theta/2$$

If we have available a sample of realizations of $Y$, an estimation approach for $\theta$ will be to randomly partition this sample into sub-samples, of equal sample size, form a collection with the maxima from these sub-samples, and implement a method-of moments approach,

$$\hat \theta = 2\Big(\hat E[Y_{(n)}]-E[Z_{(n)}]\Big)$$

Specifically, if we have available $j=1,...,m$ samples of realizations of $Y$, each of size $k$ we form the sample

$$S_m =\{Y_{(k)1},...,Y_{(k)j},...,Y_{(k)m} \}$$

and we take as an estimate of $\theta$,

$$\hat \theta = 2\Big(\frac 1m \sum_{j=1}^m Y_{(k)j}-E[Z_{(k)}]\Big)$$

SIMULATION

I generated $100$ samples each of size $k=50$. I set $X$ to be a uniform $U(0,1)$. So $\theta = 1$. An approximation for the expected value of $Z_{(k)}$ is $E[Z_{(k)}]\approx \Phi^{-1}(0.5264^{1/k}) = 2.234$ for $k=50$. I obtained $$\hat \theta = 2\cdot (2.803 -2.234) = 1.138$$

Note also that $E[Z_{(n)}]+\theta/2 = 2.734$, not far from the empirical mean of $Y_{(k)j}$, $2.803$.

Naturally, this is just an indication (and it does not exclude negative values for the estimate). Moreover sine the range of $Y_{(n)}$ is exactly $\theta$, one could think along the lines of estimating this range, by devising a suitable scaling for each $Y_{(k)j}$ (since each depends on a different $Z_{(k)j}$).

Using extreme value theory to estimate bounds

1 Answers1