Proof to obtain pooled variance equation

Question

I was checking the definition of pooled variance, and although I think it makes sense intuitively, I was wondering how can one obtain that estimator. For the case of only one group, I understand the MLE of the variance assuming gaussian iid samples in which case we obtain a biased estimate. After computing the Expected value, we can see that is actually biased and we can correct the estimation by dividing by $n-1$. However, I have not been able to find a way to arrive to the pooled variance estimation equation:

$s_p^2=\frac{\sum_i (n_i-1)s_i^2}{\sum_i (n_i-1)}$ Where $i$ is the index of the groups.

How could I obtain that equation?

Thanks!

Clarification, ¿are you looking for the proof of this formula? or ¿just if this formula is correct? (which it is) — Gregg H, Mar 30 '18 at 12:33
I am looking for the proof, I was already able to prove that the estimate is unbiased, I change the title, thanks! — Roger Trullo, Mar 30 '18 at 12:48
Though I haven't tried to work this out, my first thought is that it would be best to view it from a multiple regression frame work (where you have $k$ groups and thus $k-1$ dummy variables in the regression model). Then the pooled variance is just the error variance of the regression. — Gregg H, Mar 30 '18 at 12:57
Thanks for the lead, I am not sure I am following, could you send me a reference where they do something similar? — Roger Trullo, Mar 30 '18 at 13:34
Very nearly the same question is addressed at https://stats.stackexchange.com/questions/43159. A general answer that applies directly here is given at https://stats.stackexchange.com/questions/51622. — whuber, Mar 30 '18 at 21:46
"After computing the Expected value " - what do you mean ; ? A detailed explanation may help us understand your problem clearly. — , Apr 01 '18 at 14:23
@subhashc.davar I meant what Gregg H did in his answer, that is, showed that the estimate is unbiased. My question was more on the lines of what whuber suggested in his comment — Roger Trullo, Apr 01 '18 at 14:48
I shall appreciate if an edit of question is invoked at your end. Also, please edit your tags.The hypothesis testing and anova seem to be a misfit — , Apr 01 '18 at 15:04
I think the question is clear, also the concept of pooled variance is widely used in ANOVA and hypothesis testing. — Roger Trullo, Apr 03 '18 at 07:23
What is the source of your formula ? and what is refected by n ? Your question seems to indicate how to deal with bias as well as combing estimates. — , Apr 03 '18 at 09:33
It is in the link that I use in the question. The $n_i$ indicates the number of elements in group $i$. I just thought it was standard notation. — Roger Trullo, Apr 03 '18 at 09:48

score 0 · Answer 1 · answered Mar 30 '18 at 15:00

I'm taking a stab at this, as I think it is just a weighted average: $$\begin{align}E[s_p^2] & = E\left[\frac{\Sigma(n_i-1)s_i^2}{\Sigma(n_i-1)}\right] \\ & = \frac{1}{\Sigma(n_i-1)}E\left[\Sigma(n_i-1)s_i^2\right] \\ & = \frac{1}{\Sigma(n_i-1)}\left(\Sigma(n_i-1)E[s_i^2]\right) \\ & = \frac{1}{\Sigma(n_i-1)}\left(\Sigma(n_i-1)\sigma^2\right) \\ & = \frac{1}{\Sigma(n_i-1)}\left(\sigma^2\Sigma(n_i-1)\right) \\ & = \sigma^2 \end{align}$$ Sorry about the comment re: multiple regression...I think this is just using the rules for expectations.

Proof to obtain pooled variance equation

1 Answers1