2

I was checking the definition of pooled variance, and although I think it makes sense intuitively, I was wondering how can one obtain that estimator. For the case of only one group, I understand the MLE of the variance assuming gaussian iid samples in which case we obtain a biased estimate. After computing the Expected value, we can see that is actually biased and we can correct the estimation by dividing by $n-1$. However, I have not been able to find a way to arrive to the pooled variance estimation equation:

$s_p^2=\frac{\sum_i (n_i-1)s_i^2}{\sum_i (n_i-1)}$ Where $i$ is the index of the groups.

How could I obtain that equation?

Thanks!

Roger Trullo
  • 275
  • 2
  • 10
  • Clarification, ¿are you looking for the proof of this formula? or ¿just if this formula is correct? (which it is) – Gregg H Mar 30 '18 at 12:33
  • I am looking for the proof, I was already able to prove that the estimate is unbiased, I change the title, thanks! – Roger Trullo Mar 30 '18 at 12:48
  • Though I haven't tried to work this out, my first thought is that it would be best to view it from a multiple regression frame work (where you have $k$ groups and thus $k-1$ dummy variables in the regression model). Then the pooled variance is just the error variance of the regression. – Gregg H Mar 30 '18 at 12:57
  • Thanks for the lead, I am not sure I am following, could you send me a reference where they do something similar? – Roger Trullo Mar 30 '18 at 13:34
  • Very nearly the same question is addressed at https://stats.stackexchange.com/questions/43159. A general answer that applies directly here is given at https://stats.stackexchange.com/questions/51622. – whuber Mar 30 '18 at 21:46
  • "After computing the Expected value " - what do you mean ; ? A detailed explanation may help us understand your problem clearly. –  Apr 01 '18 at 14:23
  • @subhashc.davar I meant what Gregg H did in his answer, that is, showed that the estimate is unbiased. My question was more on the lines of what whuber suggested in his comment – Roger Trullo Apr 01 '18 at 14:48
  • I shall appreciate if an edit of question is invoked at your end. Also, please edit your tags.The hypothesis testing and anova seem to be a misfit –  Apr 01 '18 at 15:04
  • I think the question is clear, also the concept of pooled variance is widely used in ANOVA and hypothesis testing. – Roger Trullo Apr 03 '18 at 07:23
  • What is the source of your formula ? and what is refected by n ? Your question seems to indicate how to deal with bias as well as combing estimates. –  Apr 03 '18 at 09:33
  • It is in the link that I use in the question. The $n_i$ indicates the number of elements in group $i$. I just thought it was standard notation. – Roger Trullo Apr 03 '18 at 09:48

1 Answers1

0

I'm taking a stab at this, as I think it is just a weighted average: $$\begin{align}E[s_p^2] & = E\left[\frac{\Sigma(n_i-1)s_i^2}{\Sigma(n_i-1)}\right] \\ & = \frac{1}{\Sigma(n_i-1)}E\left[\Sigma(n_i-1)s_i^2\right] \\ & = \frac{1}{\Sigma(n_i-1)}\left(\Sigma(n_i-1)E[s_i^2]\right) \\ & = \frac{1}{\Sigma(n_i-1)}\left(\Sigma(n_i-1)\sigma^2\right) \\ & = \frac{1}{\Sigma(n_i-1)}\left(\sigma^2\Sigma(n_i-1)\right) \\ & = \sigma^2 \end{align}$$ Sorry about the comment re: multiple regression...I think this is just using the rules for expectations.

Gregg H
  • 3,571
  • 6
  • 25