Expectation of the Maximum of iid Gumbel Variables

Question

I keep reading in economics journals about a particular result used in random utility models. One version of the result is: if $\epsilon_i \sim_{iid}, $ Gumbel($\mu, 1), \forall i$, then:

$$E[\max_i(\delta_i + \epsilon_i)] = \mu + \gamma + \ln\left(\sum_i \exp\left\{\delta_i \right\} \right), $$

where $\gamma \approx 0.52277$ is the Euler-Mascheroni constant. I've checked that this makes sense using R, and it does. The CDF for the Gumbel$(\mu, 1)$ distribution is:

$$G(\epsilon_i) = \exp(-\exp(-(\epsilon_i - \mu)))$$

I'm trying to find a proof of this and I've had no success. I've tried to prove it myself but I can't get past a particular step.

Can anyone point me to a proof of this? If not, maybe I can post my attempted proof up to where I get stuck.

Related: https://stats.stackexchange.com/questions/57506/expected-value-of-latent-utility-in-logistic-regression — Adrian, Jan 26 '16 at 08:43
I am missing an explanation what $\delta_i$ means and how many $\epsilon_i$ are used to compute the maximum. — Sextus Empiricus, Oct 14 '21 at 16:58

score 13 · Answer 1 · edited Oct 14 '21 at 16:31

It turns out that an Econometrica article by Kenneth Small and Harvey Rosen showed this in 1981, but in a very specialized context so the result requires a lot of digging, not to mention some training in economics. I decided to prove it in a way I find more accessible.

Proof: Let $J$ be the number of alternatives. Depending on the values of the vector $\boldsymbol{\epsilon} = \{\epsilon_1, ..., \epsilon_J\}$, the function $\max_i(\delta_i + \epsilon_i)$ takes on different values. First, focus on the values of $\boldsymbol{\epsilon}$ such that $\max_i (\delta_i + \epsilon_i) = \delta_1 + \epsilon_1$. That is, we will integrate $\delta_1 + \epsilon_1$ over the set $M_1 \equiv \{\boldsymbol\epsilon : \delta_1 + \epsilon_1 > \delta_j + \epsilon_j, j \neq 1\}$:

\begin{equation} \begin{split} E_{\boldsymbol \epsilon \in M_1} [\max_i(\delta_i + \epsilon_i)] = \hspace{3.25in}\\ \int^{\infty}_{-\infty} (\delta_1 + \epsilon_1)f(\epsilon_1) \left[\int_{-\infty}^{\delta_1 + \epsilon_1 - \delta_2} ... \int_{-\infty}^{\delta_1 + \epsilon_1 - \delta_J}f(\epsilon_2) ...f(\epsilon_J) d\epsilon_2 ...d\epsilon_J \right] d\epsilon_1 = \\ \int^{\infty}_{-\infty} (\delta_1 + \epsilon_1)f(\epsilon_1) \left(\int_{-\infty}^{\delta_1 + \epsilon_1 - \delta_2} f(\epsilon_2)d\epsilon_2 \right) ... \left( \int_{-\infty}^{\delta_1 + \epsilon_1 - \delta_J}f(\epsilon_J) d\epsilon_J \right) d\epsilon_1 = \\ \int^{\infty}_{-\infty} \left(\delta_1 + \epsilon_1\right) f(\epsilon_1) F(\delta_1 + \epsilon_1 - \delta_2) ...F(\delta_1 + \epsilon_1 - \delta_J) d\epsilon_1. \end{split} \end{equation}

The term above is the first of $J$ such terms in $E[\max_i \left(\delta_i + \epsilon_i \right)]$. Specifically,

\begin{equation} E\left[\max_i \left(\delta_i + \epsilon_i \right)\right] = \sum_i E_{\boldsymbol \epsilon \in M_i}\left[\max_i\left( \delta_i + \epsilon_i \right) \right]. \end{equation}

Now we apply the functional form of the Gumbel distribution. This gives

\begin{equation} \begin{split} &E_{\boldsymbol \epsilon \in M_i}\left[\max_i\left( \delta_i + \epsilon_i \right) \right] = \hspace{2in} \\ &\int^{\infty}_{-\infty} \left(\delta_i + \epsilon_i\right)e^{\mu - \epsilon_i} e^{- e^{\mu - \epsilon_i}} \prod_{j \neq i} e^{-e^{\mu - \epsilon_i + \delta_j - \delta_i}}d\epsilon_i \\ =&\int^{\infty}_{-\infty} \left(\delta_i + \epsilon_i\right)e^{\mu - \epsilon_i } \prod_{j } e^{-e^{\mu - \epsilon_i + \delta_j - \delta_i}}d\epsilon_i \\ =&\int^{\infty}_{-\infty} \left(\delta_i + \epsilon_i \right) e^{\mu - \epsilon_i} \exp \Bigl\{ \sum_{j} -e^{\mu - \epsilon_i + \delta_j - \delta_i} \Bigr\}d\epsilon_i \\ =&\int^{\infty}_{-\infty} \left(\delta_i + \epsilon_i \right) e^{\mu - \epsilon_i} \exp \Bigl\{ -e^{\mu - \epsilon_i } \sum_{j} e^{ \delta_j - \delta_i} \Bigr\}d\epsilon_i, \end{split} \end{equation}

where the second step comes from collecting one of the exponentiated terms into the product, along with the fact that $\delta_j - \delta_i = 0$ if $i = j$.

Now we define $D_i \equiv \sum_j e^{\delta_j - \delta_i}$, and make the substitution $x = D_i\hspace{0.5mm} e^{\mu - \epsilon_i}$, so that $ dx = -D_i e^{\mu - \epsilon_i}d\epsilon_i \Rightarrow -\frac{dx} {D_i} = e^{\mu - \epsilon_i}d\epsilon_i$ and $\epsilon_i = \mu - \log\left(\frac{x}{D_i}\right)$. Note that as $\epsilon_i$ approaches infinity, $x$ approaches 0, and as $\epsilon_i$ approaches negative infinity, $x$ approaches infinity:

\begin{equation} \begin{split} &\hspace{3mm} E_{\boldsymbol \epsilon \in M_i}\left[\max_i\left( \delta_i + \epsilon_i \right) \right] = \\ &\hspace{3mm}\int^{0}_{\infty} \left(\delta_i + \mu - \log\left[\frac{x}{D_i} \right]\right)\left(-\frac{1}{D_i}\right)\exp\left\{ -x\right\}dx \\ =&\hspace{3mm}\frac{1}{D_i}\int^{\infty}_{0} \left(\delta_i + \mu - \log\left[\frac{x}{D_i} \right]\right)e^{ -x}dx \\ =&\hspace{3mm} \frac{\delta_i + \mu}{D_i}\int^{\infty}_{0} e^{-x}dx -\frac{1}{D_i}\int^{\infty}_{0} \log[x]e^{-x}dx + \frac{\log[D_i]} {D_i} \int^{\infty}_{0}e^{-x}dx.\\ \end{split} \end{equation}

The Gamma Function is defined as $\Gamma(t) = \int^{\infty}_{0} x^{t - 1}e^{-x}dx$. For values of $t$ which are positive integers, this is equivalent to $\Gamma(t) = (t - 1)!$, so $\Gamma(1) = 0! = 1$. In addition, it is known that the Euler–Mascheroni constant, $\gamma \approx 0.57722$ satisfies

$$\gamma = -\int^{\infty}_{0} \log[x] e^{-x}dx.$$

Applying these facts gives

\begin{equation} \begin{split} &\hspace{3mm} E_{\boldsymbol \epsilon \in M_i}\left[\max_i\left( \delta_i + \epsilon_i \right) \right] = \frac{\delta_i + \mu + \gamma + \log[D_i]}{D_i}. \end{split} \end{equation}

Then we sum over $i$ to get

\begin{equation} \begin{split} &\hspace{3mm} E\left[\max_i\left( \delta_i + \epsilon_i \right) \right] = \sum_i \frac{\delta_i + \mu + \gamma + \log[D_i]}{D_i}. \end{split} \end{equation}

Recall that $D_i = \sum_j e^{\delta_j - \delta_i} = \frac{\sum_j e^{\delta_j}} {e^{\delta_i}}$. Notice that the familiar logit choice probabilities $P_i = \frac{e^{\delta_i}}{\sum_j e^{\delta_j}}$ are inverses of the $D_i$'s, or in other words $P_i = 1/D_i$. Also note that $\sum_i P_i = 1$. Then we have

\begin{equation} \begin{split} \hspace{3mm} E\left[\max_i\left( \delta_i + \epsilon_i \right) \right] =& \sum_i P_i\left(\delta_i + \mu + \gamma + \log[D_i]\right)\\ =&\hspace{2mm} (\mu + \gamma) \sum_i P_i + \sum_i P_i\delta_i + \sum_iP_i \log[D_i] \\ =& \hspace{2mm} \mu + \gamma + \sum_i P_i \delta_i + \sum_i P_i \log\left[\frac{\sum_j e^{\delta_j}} {e^{\delta_i}} \right]\\ =& \mu + \gamma + \sum_i P_i \delta_i + \sum_i P_i \log\left[\sum_j e^{\delta_j}\right] - \sum_i P_i \log[e^{\delta_i}]\\ =& \mu + \gamma + \sum_i P_i \delta_i + \log\left[ \sum_j e^{\delta_j}\right] \sum_i P_i - \sum_i P_i \delta_i \\ =& \mu + \gamma + \log\left[ \sum_j \exp\left\{ \delta_j \right\}\right] .\end{split} \end{equation} Q.E.D.

I linked what I believe is the article you're referring to, without actually looking through it to be sure; please correct if wrong. — Danica, Jan 26 '16 at 05:14
@Jason Do you know how to prove what this is when the max is conditional on one being the max? See question here that is unsolved: http://stats.stackexchange.com/questions/260847/conditional-expectation-of-a-truncated-rv-derivation-gumbel-distribution-logis — wolfsatthedoor, Feb 09 '17 at 04:47

score 11 · Accepted Answer · answered Jan 23 '18 at 22:42

I appreciate the work exhibited in your answer: thank you for that contribution. The purpose of this post is to provide a simpler demonstration. The value of simplicity is revelation: we can easily obtain the entire distribution of the maximum, not just its expectation.

Ignore $\mu$ by absorbing it into the $\delta_i$ and assuming the $\epsilon_i$ all have a Gumbel$(0,1)$ distribution. (That is, replace each $\epsilon_i$ by $\epsilon_i-\mu$ and change $\delta_i$ to $\delta_i+\mu$.) This does not change the random variable

$$X = \max_{i}(\delta_i + \epsilon_i) = \max_i((\delta_i+\mu) + (\epsilon_i-\mu)).$$

The independence of the $\epsilon_i$ implies for all real $x$ that $\Pr(X\le x)$ is the product of the individual chances $\Pr(\delta_i+\epsilon_i\le x)$. Taking logs and applying basic properties of exponentials yields

$$\eqalign{ \log \Pr(X\le x) &= \log\prod_{i}\Pr(\delta_i + \epsilon_i \le x) = \sum_i \log\Pr(\epsilon_i \le x - \delta_i)\\ &= -\sum_ie^{\delta_i}\, e^{-x} = -\exp\left(-x + \log\sum_i e^{\delta_i}\right). }$$

This is the logarithm of the CDF of a Gumbel distribution with location parameter $\lambda=\log\sum_i e^{\delta_i}.$ That is,

$X$ has a Gumbel$\left(\log\sum_i e^{\delta_i}, 1\right)$ distribution.

This is much more information than requested. The mean of such a distribution is $\gamma+\lambda,$ entailing

$$\mathbb{E}[X] = \gamma + \log\sum_i e^{\delta_i},$$

QED.

Expectation of the Maximum of iid Gumbel Variables

2 Answers2

Linked

Related