6

Let $S = \frac{X_1 + \cdots + X_n}{n}$ where the $X_i$ are IID Bernoulli distributed with mean $p$, then $E[S] = p$ and $Var(S) = \frac{p(1-p)}{n}$.

Now consider the slightly more complex setup where $S'=\frac{1}{nk}(\sum_{1 \leq i \leq k} \sum_{1\leq j \leq n} X_{ij})$ where $\{X_{ij} | 1 \leq j \leq n \}$ are IID Bernoulli samples with a mean of $p_i \forall 1 \leq i \leq k$.

The $p_i$ are IID sampled from the uniform distribution on $[0,1]$.

How do I go about calculating mean and Var of $S'$? How do I formalize this?

gauss
  • 117
  • 3
  • 1
    I am confused. [Wikipedia](https://en.wikipedia.org/wiki/Compound_probability_distribution) (halfway through the "Examples") tells me that a Bernoulli-uniform compound is Bernoulli with success probability $\frac{1}{2}$, so $nkS'$ should simply be binomial with parameters $\sum n_k$ and $\frac{1}{2}$ ([see here](https://stats.stackexchange.com/q/93852/1352)). However, a simulation gives me the correct mean of $\frac{\sum n_k}{2}$, but a variance that is larger than the $\frac{\sum n_k}{4}$ I would have expected. – Stephan Kolassa Feb 23 '22 at 07:19
  • 2
    @StephanKolassa The variance _must_ be larger due to the variability of the $p_i$'s. See the [Law of Total Variance](https://en.wikipedia.org/wiki/Law_of_total_variance). – Xi'an Feb 23 '22 at 08:20
  • 2
    Ben's answer is a great demonstration of how to derive this. Another path is to recognize the relationship to a beta-binomial distribution, since a uniform distribution on [0,1] is a beta(1,1) distribution. https://en.wikipedia.org/wiki/Beta-binomial_distribution – Sycorax Feb 23 '22 at 18:11

1 Answers1

8

Since all variables are IID, this is quite straightforward. First we compute the conditional moments:

$$\begin{align} \mathbb{E}(S' | \mathbf{p}) &= \mathbb{E} \bigg( \frac{1}{nk} \sum_{i=1}^k \sum_{j=1}^n X_{ij} \bigg|\mathbf{p}\bigg) \\[6pt] &= \frac{1}{nk} \sum_{i=1}^k \sum_{j=1}^n \mathbb{E}(X_{ij}|p_i) \\[6pt] &= \frac{1}{nk} \sum_{i=1}^k \sum_{j=1}^n p_i \\[6pt] &= \frac{1}{k} \sum_{i=1}^k p_i, \\[12pt] \mathbb{V}(S' | \mathbf{p}) &= \mathbb{V} \bigg( \frac{1}{nk} \sum_{i=1}^k \sum_{j=1}^n X_{ij} \bigg| \mathbf{p}\bigg) \quad \quad \quad \quad \\[6pt] &= \frac{1}{n^2 k^2} \sum_{i=1}^k \sum_{j=1}^n \mathbb{V}(X_{ij}|p_i) \\[6pt] &= \frac{1}{n^2 k^2} \sum_{i=1}^k \sum_{j=1}^n p_i (1-p_i) \\[6pt] &= \frac{1}{n k^2} \sum_{i=1}^k p_i (1-p_i). \\[12pt] \end{align}$$

Then we use the law of iterated expectation and the law of iterated variance to compute the unconditional moments:

$$\begin{align} \mathbb{E}(S') &= \mathbb{E}(\mathbb{E}(S' | \mathbf{p})) \\[6pt] &= \mathbb{E} \bigg( \frac{1}{k} \sum_{i=1}^k p_i \bigg) \\[6pt] &= \frac{1}{k} \sum_{i=1}^k \mathbb{E}(p_i)) \\[6pt] &= \frac{1}{k} \sum_{i=1}^k \frac{1}{2} \\[6pt] &= \frac{1}{2}, \\[12pt] \quad \quad \quad \quad \quad \mathbb{V}(S') &= \mathbb{E}(\mathbb{V}(S' | \mathbf{p})) + \mathbb{V}(\mathbb{E}(S' | \mathbf{p})) \\[6pt] &= \mathbb{E} \bigg( \frac{1}{n k^2} \sum_{i=1}^k p_i (1-p_i) \bigg) + \mathbb{V} \bigg( \frac{1}{k} \sum_{i=1}^k p_i \bigg) \\[6pt] &= \frac{1}{n k^2} \sum_{i=1}^k \mathbb{E}(p_i (1-p_i)) + \frac{1}{k^2} \sum_{i=1}^k \mathbb{V}(p_i) \\[6pt] &= \frac{1}{n k^2} \sum_{i=1}^k \bigg( \frac{1}{2} - \frac{1}{3} \bigg) + \frac{1}{k^2} \sum_{i=1}^k \frac{1}{12} \\[6pt] &= \frac{1}{n k^2} \sum_{i=1}^k \frac{1}{6} + \frac{1}{k^2} \sum_{i=1}^k \frac{1}{12} \\[6pt] &= \frac{1}{6 n k} + \frac{1}{12 k} \\[6pt] &= \frac{1}{12 k} \cdot \frac{n+2}{n}. \\[6pt] \end{align}$$

As a sanity check, we observe that $\mathbb{V}(S') \rightarrow 0$ as $\min(k,n) \rightarrow \infty$, which is what we would expect from the law of large numbers.

Jarle Tufto
  • 7,989
  • 1
  • 20
  • 36
Ben
  • 91,027
  • 3
  • 150
  • 376