How can we bound the probability that a random variable is maximal?

Question

$\newcommand{\P}{\mathbb{P}}$Suppose we have $N$ independent random variables $X_1$, $\ldots$, $X_n$ with finite means $\mu_1 \leq \ldots \leq \mu_N$ and variances $\sigma_1^2$, $\ldots$, $\sigma_N^2$. I am looking for distribution-free bounds on the probability that any $X_i \neq X_N$ is larger than all other $X_j$, $j \neq i$.

In other words, if for simplicity we assume the distributions of $X_i$ are continuous (such that $\P(X_i = X_j) = 0$), I am looking for bounds on: $$ \P( X_i = \max_j X_j ) \enspace. $$ If $N=2$, we can use Chebyshev's inequality to get: $$ \P(X_1 = \max_j X_j) = \P(X_1 > X_2) \leq \frac{\sigma_1^2 + \sigma_2^2}{\sigma_1^2 + \sigma_2^2 + (\mu_1 - \mu_2)^2} \enspace. $$ I would like to find some simple (not necessarily tight) bounds for general $N$, but I have not been able to find (esthetically) pleasing results for general $N$.

Please note that the variables are not assumed to be i.i.d.. Any suggestions or references to related work are welcome.

Update: recall that by assumption, $\mu_j \geq \mu_i$. We can then use the above bound to arrive at: $$ \P(X_i = \max_j X_j) \leq \min_{j > i} \frac{\sigma_i^2 + \sigma_j^2}{\sigma_i^2 + \sigma_j^2 + (\mu_j - \mu_i)^2} \leq \frac{\sigma_i^2 + \sigma_N^2}{\sigma_i^2 + \sigma_N^2 + (\mu_N - \mu_i)^2} \enspace. $$ This implies: $$ ( \mu_N - \mu_i ) \P( X_i = \max_j X_j ) \leq (\mu_N - \mu_i) \frac{\sigma_i^2 + \sigma_N^2}{\sigma_i^2 + \sigma_N^2 + (\mu_N - \mu_i)^2} \leq \frac{1}{2} \sqrt{ \sigma_i^2 + \sigma_N^2 } \enspace. $$ This, in turn, implies: $$ \sum_{i=1}^N \mu_i \P( X_i = \max_j X_j ) \geq \mu_N - \frac{N}{2} \sqrt{ \sum_{i=1}^{N-1} (\sigma_i^2 + \sigma_N^2) } \enspace. $$ I am now wondering whether this bound can be improved to something that does not depend linearly on $N$. For instance, does the following hold: $$ \sum_{i=1}^N \mu_i \P( X_i = \max_j X_j ) \geq \mu_N - \sqrt{ \sum_{i=1}^N \sigma_i^2 } \enspace? $$ And if not, what could be a counterexample?

The event $\{X_1=\max_j X_j\}=\{X_1\geq X_2\}\cap\dots\cap\{X_1\geq X_N\}$, then in particular $P(X_1=\max_j X_j)=P(X_1\geq X_2\,\dots,X_1\geq X_N)\leq P(X_1\geq X_2)$ and you can use the bound you already got. This might be a very rough bound but you can find the combination that gives the minimum upper bound among this class. — , Jun 11 '12 at 14:29
@Procrastinator : Thanks. Since I don't only consider $X_1$, I could indeed use $$P(X_i = \max_i X_j) \leq P( X_i > X_N) \leq \frac{\sigma_i^2 + \sigma_N^2}{\sigma_i^2 + \sigma_N^2 + (\mu_N - \mu_i)^2} .$$ However, if you can think of any tighter bounds, it would be much appreciated. — MLS, Jun 11 '12 at 14:35
This bound can be tighter if you use the index $j$ that gives you the smaller upper bound instead of $N$. Note that this value depends on both the mean and the variance. — , Jun 11 '12 at 14:39
@Procrastinator : That's true, but then I guess only under the condition that $j > i$, since otherwise $\mu_j < \mu_i$ and Chebyshev's inequality no longer applies. So, we have arrived at something like: $$P(X_i = \max_j X_j ) \leq \min_{j > i} \frac{\sigma_i^2 + \sigma_j^2}{\sigma_i^2 + \sigma_j^2 + (\mu_j - \mu_i)^2} .$$ Thanks, this already looks quite helpful. (However, even better bounds are still always appreciated :) ) — MLS, Jun 11 '12 at 14:46
@MichaelChernick : I don't believe that is correct. Suppose for instance we have three uniform distributions on $[0,1]$. Then, if I'm not mistaken, $P( X_1 < \max_j X_j ) = 2/3$, whereas $P(X_1 < X_2) = P(X_1 < X_3) = 1/2$. I do not know if you meant to write $P( X_i > \max_j X_j )$, but then the same example shows that it still isn't valid. — MLS, Jun 11 '12 at 15:23
Incidentally, is it a good idea to post the same question on some other sites (e.g., math.stackexchange.com), or is this perhaps even discouraged? — MLS, Jun 11 '12 at 19:29
@MLS: Please do not simultaneously crosspost this to another site; this is, indeed, discouraged. See a very recent question on [meta.stats.SE](http://meta.stats.stackexchange.com) for more info on this. — cardinal, Jun 11 '12 at 20:42
@MLS The maximum of n independent random variables is less than A if and only if all n variables are less than A. That is what I intended to say. What I said is clearly wrong and I don't know why I wrote it. I guess what I should have said is P{X1=maxj Xj]=P[X1>=X2] P{X1>=X3]...P[X1>=XN} and similarly for the other Xis. — Michael R. Chernick, Jun 12 '12 at 00:52
@Michael: That is still not true, unfortunately. The events $A_j = \{X_i > X_j\}$ for *fixed* $i$ are not independent. — cardinal, Jun 12 '12 at 01:08
@MLS: The update left me wondering if you are interested in upper bounds, lower bound or both. — cardinal, Jun 12 '12 at 01:09
@cardinal My intention was to state that P[X1=max Xj} is P[ X1>=X2 and X1>=X3... X1>=XN] I made the presumption that because the Xis are mutually independent that the probability of the intersection of the events would be the product of the probabilities. But as you point out these do not factor. But I think there may be a positive dependence that can be exploited. For example P[X1>X2 and X1>X3]=P[X1>X2|X1>X3]P[X1>X3]. Now P{X1>X2|X1>X3] > P[X1>X2] because if it is known that X1>X3 where X3 is independent of X2 then it is more likely that X1>X2. This is admittedly a heuristic argument. — Michael R. Chernick, Jun 12 '12 at 02:10
@cardinal : Lets denote $p_i = P( X_i = \max_j X_j )$. I'm mainly interested in **lower** bounds for $\sum p_i \mu_i$. However, I would like to express this lower bound as $\sum p_i \mu_i \geq \mu_N - A$ for some $A$ (preferrably, the smallest possible $A$). This translates into a question on **upper** bounds on $p_i$, since we can rewrite this as $\sum p_i (\mu_N - \mu_i) \leq A$. Note that $\mu_N - \mu_i$ is non-negative by definition. For know, I'm mainly focussing on finding the lowest expression for $A$ that depends only on means and variances, but other ideas are welcome. — MLS, Jun 12 '12 at 09:44
@MLS: Thanks for the additional info. That's helpful. I'm a little curious to know about the reason for the interest in the particular quantity $\sum_i p_i \mu_i$. On the surface, it doesn't seem directly related to a "natural" quantity of interest; but, I'm probably missing something obvious. — cardinal, Jun 12 '12 at 15:04
@cardinal : Amongst other things, it's related to multi-armed bandits. If you pick an arm based on previous rewards, how big is the probability that you picked the best arm (that would be $P(X_N = \max_j X_j)$ in the notation above), and can we bound the expected loss for picking a sub-optimal arm? — MLS, Jun 12 '12 at 15:41
Crossposted to MathOverflow: http://mathoverflow.net/questions/99313 — cardinal, Jun 13 '12 at 00:09
@cardinal I see now that you had responded to my question about crossposting. I had overlooked it, because it was followed by two other comments by you and Michael Chernick - all of which posted during my sleep (I'm in Europe). By then, I had already crossposted the question, since mathoverflow looked to have little overlap in types of questions and community, and reasoning the question could be interesting to both. I'm sorry, I should have been more patient. What would be best? I'd be happy to delete one of them, but I'm not sure that that is the suggested course of action. — MLS, Jun 13 '12 at 05:27
@MLS: I wouldn't worry about it too much at this point. The important thing now is to have them linked together; that's why I added the matching comments here and there. There *are* a handful of us that frequent stats.SE, math.SE and MO. :) — cardinal, Jun 13 '12 at 16:24
Since it is now over a month later and I still have not found an answer I've reposted this question to [math.SE](http://math.stackexchange.com/questions/172793/distribution-free-bounds-on-the-probability-of-maximality). — MLS, Jul 19 '12 at 10:33
@cardinal thanks for the help before. Unfortunately, the question did not gain any traction on math.SE and I'm still stuck. I've been trying to find some smart applications of Markov's ineqaulity, but up to now to no avail. Perhaps this is not the right way to proceed. If you (or anyone else reading this comment) think of any potentially helpful tips, please let me know -- even though the question is now a little older, any answers or related tips are still relevant to me. Thanks. — MLS, Jul 25 '12 at 10:22
@MLS: Thanks for the follow up. I'll take another look at this problem. There also may be ways to reignite the interest of others. I'll think about that and ping you. — cardinal, Jul 25 '12 at 13:48
This is related to extreme value theory: http://en.wikipedia.org/wiki/Extreme_value_theory — Tim, Feb 12 '15 at 12:01
One needs some more input about the dependence, i.e. the copula between them. Eventually, the problem can be solved only in terms of copulae, since it is nonparametric (invariant under monotonous transformation of all the rv.s). There are special copulae with larger dependence in the tails. Clearly, such conditions would have an impact on the solution. — Horst Grünbusch, Sep 29 '16 at 16:57
I may be a bit dumb here, but can you show how you use Chebyshev to derive the bound in your question? I seem to get $\cdots \leq (\sigma_1^2 + \sigma_2^2) / (\mu_1 - \mu_2)^2$ instead. — jochen, Nov 04 '16 at 22:07
The second inequality in your update may not be correct. $\frac{\sigma_i^2 + \sigma_j^2}{\sigma_i^2 + \sigma_j^2 + (\mu_j - \mu_i)^2} \leq \frac{\sigma_i^2 + \sigma_N^2}{\sigma_i^2 + \sigma_N^2 + (\mu_N - \mu_i)^2}$ since indeed $(\mu_N - \mu_i) \geq (\mu_j - \mu_i)$ but the different $\sigma$ play a role as well — Sextus Empiricus, Jun 21 '17 at 13:14

Sextus Empiricus · Answer 1 · 2017-06-21T19:45:46.780

You can use the multivariate Chebyshev's inequality.

Two variables case

For a single situation, $X_1$ vs $X_2$, I arrive at the same situation as Jochen's comment on Nov 4 2016

1) If $\mu_1 < \mu_2$ then $ P(X_1>X_2) \leq (\sigma_1^2 + \sigma_2^2)/(\mu_1-\mu_2)^2 $

(and I wonder as well about your derivation)

Derivation of equation 1

using the new variable $X_1-X_2$
transforming it such that it has the mean at zero
taking the absolute value
applying the Chebyshev's inequality

\begin{array} \\ P \left( X_1 > X_2 \right) &= P \left( X_1 - X_2 > 0 \right)\\ &= P\left( X_1 - X_2 - (\mu_1 - \mu_2) > - (\mu_1 - \mu_2)\right) \\ &\leq P\left( \vert X_1 - X_2 - (\mu_1 - \mu_2) \vert > \mu_2 - \mu_1\right) \\ &\leq \frac{\sigma_{(X_1-X_2- (\mu_1 - \mu_2))}^2}{(\mu_2 - \mu_1)^2} = \frac{\sigma_{X_1}^2+\sigma_{X_2}^2}{(\mu_2 - \mu_1)^2}\\ \end{array}

Multivariate Case

The inequality in equation (1) can be changed into a multivariate case by applying it to multiple transformed variables $(X_n-X_i)$ for each $i<n$ (note that these are correlated).

A solution to this problem (multivariate and correlated) has been described by I. Olkin and J. W. Pratt. 'A Multivariate Tchebycheff Inequality' in the Annals of Mathematical Statistics, volume 29 pages 226-234 http://projecteuclid.org/euclid.aoms/1177706720

Note theorem 2.3

$P(\vert y_i \vert \geq k_i \sigma_i \text{ for some } i) = P(\vert x_i \vert \geq 1 \text{ for some } i) \leq \frac{(\sqrt{u} + \sqrt{(pt-u)(p-1)})^2}{p^2}$

in which $p$ the number of variables, $t=\sum k_i^{-2}$, and $u=\sum \rho_{ij}/(k_ik_j)$.

Theorem 3.6 provides a tighter bound, but is less easy to calculate.

Edit

A sharper bound can be found using the multivariate Cantelli's inequality. That inequality is the type that you used before and provided you with the boundary $(\sigma_1^2 + \sigma_2^2)/(\sigma_1^2 + \sigma_2^2 + (\mu_1-\mu_2)^2)$ which is sharper than $(\sigma_1^2 + \sigma_2^2)/(\mu_1-\mu_2)^2$.

I haven't taken the time to study the entire article, but anyway, you can find a solution here:

A. W. Marshall and I. Olkin 'A One-Sided Inequality of the Chebyshev Type' in Annals of Mathematical Statistics volume 31 pp. 488-491 https://projecteuclid.org/euclid.aoms/1177705913

(later note: This inequality is for equal correlations and not sufficient help. But anyway your problem, to find the sharpest bound, is equal to the, more general, multivariate Cantelli inequality. I would be surprised if the solution does not exist)

Could you provide a clear statement of the multivariate Chebyshev Inequality? — whuber, Jun 21 '17 at 13:59

Peter Series · Answer 2 · 2017-01-13T09:23:25.993

I have found a theorem that might help you and will try to adjust it for your needs. Assume you have:

$$exp(t \cdot \mathbf{E}(\underset{1 \leq i \leq n}{max}X_{i}))$$

Then by Jensen's inequality (since exp(.) is a convex function), we get:

$$exp(t \cdot \mathbf{E}(\underset{1 \leq i \leq n}{max}X_{i})) \leq \mathbf{E}(exp( t \cdot \underset{1 \leq i \leq n}{max}X_{i})) = \mathbf{E}( \underset{1 \leq i \leq n}{max} \text{ }exp( t \cdot X_{i})) \leq \sum_{i=1}^{n} \mathbf{E} (exp(t \cdot X_{i}) $$

Now for $exp(t \cdot X_{i} $ you have to plug in whatever the moment generating function of your random variable $X_{i}$ is (since it is just the definition of the mgf). Then, after doing so (and potentially simplifying your term), you take this term and take the log and divide by it by t so that you get a statement about the term $\mathbf{E}(\underset{1 \leq i \leq n}{max}X_{i})$. Then you can choose t with some arbitrary value (best so that the term is small so that the bound is tight).

Then, you have a statement about the expected value of the maximum over n rvs. To get now the a statement about the probabilty that the maximum of those rv's deviates from this expected value, you can just use Markov's inequality (assuming that your rv is non-negative) or another, more specific rv, applying to your particular rv.