How do extreme values scale with sample size?

Question

Assume I have a random vector $X = \{x_1, x_2, ..., x_N\}$, composed of i.i.d. binomially distributed values. If it would simplify the problem substantially, we can approximate them as normally distributed. Given that all other parameters are fixed, I want to know how $E[min(X)]$ (the expected value of the smallest number in the vector $X$) scales with $N$.

I don't care about a precise answer. I just want to know how it scales, i.e. linearly (obviously not), exponentially, power law, etc.

Although the answers below differ in their exact form, they all point to *exponential* in $N$ for fixed $n$ and $p$ — probabilityislogic, Jan 25 '11 at 23:12

Did · Accepted Answer · 2011-01-27T20:44:19.810

10

Assume that the random variables $x_k$ are i.i.d., nonnegative, integer valued, bounded by $n$, and such that $P(x_k=0)$ and $P(x_k=1)$ are both positive. For every $N\ge1$, let $$ X_N= \min\{x_1,\ldots,x_N\}. $$ Then, when $N\to+\infty$, $$ E(X_N)=c^N(1+o(1)), $$ where $c<1$ is independent of $N$ and given by $$ c=P(x_k\ge1). $$ Hence $E(X_N)$ is exponentially small. When each $x_k$ is Binomial $(n,p)$ with $n\ge1$ and $p$ in $(0,1)$ fixed, the result holds with $c=1-(1-p)^n$.

To see this, note that $[X_N\ge i]=[x_1\ge i]\cap\cdots\cap[x_N\ge i]$ for every $i$ and that, since $X_N$ is nonnegative and integer valued, $E(X_N)$ is the sum over $i\ge1$ of $P(X_N\ge i)$, hence $$ E(X_N)=\sum_{i\ge 1}P(x_1\ge i)^N. $$ For every $i\ge n+1$, $P(x_1\ge i)=0$. For every $2\le i\le n$, $0\le P(x_1\ge i)\le P(x_1\ge 2)$. Hence $$ c^N\le E(X_N)\le c^N+(n-1)d^N, $$ with $$ c=P(x_1\ge1),\quad d=P(x_1\ge 2). $$ Because $P(x_k=1)$ is positive, one knows that $d<c$, hence $E(X_N)\sim c^N$ when $N\to+\infty$.

edited Jan 27 '11 at 20:44

answered Jan 25 '11 at 21:10

Did

1,577
14
23

Nice answer, better than all the others, because you have *bounded* the expectation. +1 from me! – probabilityislogic Jan 25 '11 at 22:44
+1 for nice proof. It should be noted however, that for $c^N$ to dominate term $(n-1)d^N$ by a factor of $m$, $N$ should be larger than $\frac{\log m(n-1)}{-\log d}$. For binomial distribution with $n=100$ and $p=0.1$, and the factor $m=10$ this works out at $N\approx 3500$. So this estimate should be used carefully for not so large $N$. On the other hand you can use estimate $c^N+(n-1)d^N$ directly in that case. – mpiktas Jan 26 '11 at 07:32
@mpiktas Thanks for your comment. It seems you forgot $c$ in your estimation of the minimal $N$. Note that the OP asked for the *scale* of $E(X_N)$, meaning (presumably) its asymptotics when $N\to+\infty$. Re an actual estimation in the binomial case and for moderately large values of $N$, I would recommend going one step further, that is, using $c^N+d^N\le E(X_N)\le c^N+d^N+(n-2)f^N$ with $f=P(x_1\ge 3)$. – Did Jan 26 '11 at 08:16
$m$ accounts for $c$, but it is not that important. – mpiktas Jan 26 '11 at 09:03
@mpiktas I cannot reconcile your "$m$ accounts for $c$" with your "for $c^N$ to dominate term $(n-1)d^N$ by a factor of $m$", which, for me, asks that $c^N\ge m(n-1)d^N$, that is, that $N\ge N(m)$ with $N(m)=\log(m(n-1))/\log(c/d)$. But, as you say, *it is not that important*. – Did Jan 26 '11 at 17:24

score 2 · Answer 2 · answered Jan 25 '11 at 07:31

2

The table in this page of this book might help you. The explicit formulas for the expectation of minimum of sample of binomial distributions is given in the page before.

answered Jan 25 '11 at 07:31

mpiktas

33,140
5
82
138

probabilityislogic · Answer 3 · 2011-01-25T23:25:12.873

1

The distribution of the minimum of any set of N iid random variables is:

$$f_{min}(x)=Nf(x)[1-F(x)]^{N-1}$$

Where $f(x)$ is the pdf and $F(x)$ is the cdf (this is sometime called a $Beta-F$ distribution, because it is a compound of a Beta distribution and an arbitrary distribution). Hence the expectation (in this particular case) is given by:

$$E[min(X)] = N\sum_{x=0}^{x=n} xf(x)[1-F(x)]^{N-1}$$

Which means that $E[min(X)]=NE(x_1[1-F(x_1)]^{N-1})$. Using the "delta method" to approximation this expectation $E[g(x)]\approx g(E[X])$ gives

$$E[min(X)]=NE(x_1[1-F(x_1)]^{N-1})\approx N(E(x_1)[1-F(E(x_1))]^{N-1})$$

Substituting $np=E[x_1]$ then gives the approximation:

$$E[min(X)]\approx Nnp[1-F(np)]^{N-1}$$

Note that $F(np)\approx \frac{1}{2}$ (via normal approx.) to give

$$E[min(X)]\approx \frac{Nnp}{2^{N-1}}$$

edited Jan 25 '11 at 23:25

answered Jan 25 '11 at 07:00

probabilityislogic

22,555
4
76
97

@probabilityislogic, Delta method does not give the approximation $E[g(X)]\approx g(EX)$. If $g$ is convex(concave) then you even have the inequality. See http://stats.stackexchange.com/questions/5782/variance-of-a-function-of-one-random-variable/5790#5790 and http://en.wikipedia.org/wiki/Taylor_expansions_for_the_moments_of_functions_of_random_variables – mpiktas Jan 25 '11 at 07:17
Yes it does. expand $g(X)$ about $\mu=E(X)$. Then we have $g(X)\approx g(\mu) + (X-\mu)g'(\mu)$ where $g'(\mu)$ is the derivative of $g(.)$ evaluated at $\mu$. Taking expectations of both sides gives $E[g(X)]\approx E[g(\mu)] + E[(X-\mu)g'(X)]=g(\mu)$. So my approximation is valid. However, it is not the only way to get an approximation. Perhaps this is not called the "delta method", but it is a valid approximation. Can be improved by adding an extra term in the Taylor series $E[g(X)]\approx g[\mu + \frac{1}{2}\sigma^2 g''(\mu)]$ (which is on wiki page). – probabilityislogic Jan 25 '11 at 12:52
I thought the "delta method" only took the first term in a taylor series, but no extra terms. And in the "large N" case $\sigma$ will be small, and basically negligible. so my approximation should give an adequate answer. Only asked for the "tail behaviour" w.r.t $N$. – probabilityislogic Jan 25 '11 at 12:54
@probabilityislogic Although you claim your result holds for "any" set of iid RVs, it is obviously incorrect for any distribution with appreciable chance of being negative (such the standard normal), for then the expectation of the minimum must be negative. Even worse, if the left tail is heavy enough, the expectation of the minimum must diverge with $N$. – whuber Jan 25 '11 at 19:58
I did not claim my *final results* hold for any iid RVs. I said the *distribution of the minimum* is of the form I specified. And the "danger" you speak of (minimum diverging), that clearly does not happen with my answer, it converges to 0 as $N\rightarrow\infty$. And I am not using the "standard normal" I am using a normal centered at $np$ with which is where $F(np)\approx\frac{1}{2}$ comes from. The normal approx. to the binomial is good, with poor approximations for extreme p (same as the delta method I have used). – probabilityislogic Jan 25 '11 at 22:19
The *only* place where the normal approximation comes in is the approximation $F(np)\approx\frac{1}{2}$, so someone dissatisfied with that, just take $Nnp[1-F(np)]^{N-1}$ instead (which uses the Taylor series linear approximation only). The approximation "goes" in all the "right" directions for the problem - $0$ for $N\rightarrow\infty$ or $np\rightarrow 0$ and $\infty$ for $np\rightarrow \infty$ (for fixed $N$). – probabilityislogic Jan 25 '11 at 22:34
I would conjecture that the approximation $E[min(X)]\approx \frac{N}{2^{N-1}} E[x_1]$ works pretty well for *any non-negative* RV which *is not too skewed* and has *finite variance*. This is because the delta method any normal approx. are pretty good in these situations. – probabilityislogic Jan 25 '11 at 23:16
@probabilitiyislogic, approximating $Eg(X)$ by $g(EX)$ in general will not work. Pick zero mean variable and $g(x)=x^2$, then $Eg(X)=var(X)$, but $g(EX)=0$. Pick centered random variable without second moment and you will get that $Eg(X)$ does not exist, but $g(EX)$ will still be zero. So if you want to approximate $Eg(X)$ by $g(EX)$ you should be very careful to check properties of $g$ and $X$. – mpiktas Jan 26 '11 at 06:58
@mpiktas- For the first example, how can a *non-negative* random variable have a mean of 0? Only if it is a constant (=0), then the approx. is exact! and I said *finite variance* case, which rules out the other example. I did say when the approximation was good. Find an example which is in the class I specified in the comment above (non-negative, finite variance, low skewness). And besides, the approximation works well *in the particular case being described*. – probabilityislogic Jan 26 '11 at 14:07

How do extreme values scale with sample size?

3 Answers3