0

If we have a set of samples $x_1 ,\dots, x_N$ and we denote with $\bar x = \frac{1}{N} \sum_i x_i$ their average, then the sample variance is defined as

$s^2=\frac{1}{N-1} \sum(x_i - \bar x)^2$

(see [1], for example).

I have found that someone says that the sample estimate of the variance is $s^2=\frac{1}{N^2}\sum_i \sum_{i'} (x_i - x_{i'})^2 $

(see formula (14.27) of [2])

"Sample variance" and "sample estimate of the variance" should be the same thing, right? However, I don't find how the 1st formula equates the 2nd.

Has anyone any idea? Thanks!

[1] https://onlinecourses.science.psu.edu/stat414/node/66

[2] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). http://statweb.stanford.edu/~tibs/ElemStatLearn/

  • The second equation appears to be variance across stratas – Jon May 17 '17 at 16:23
  • The second estimate involves a double summation and it is not clear exactly how $x_i$ differs from $x_i$,. What ranges are the summations taken over? Finally the choice of N versus N-1 in the denominator in the case of equation (1) just depends on whether or not you want to use the unbiased estimator. – Michael R. Chernick May 17 '17 at 16:29
  • @MichaelChernick the double summation is over the same vector twice. – AdamO May 17 '17 at 16:54
  • Maybe you missed "2" when you copied formula (14.27) of [2]. (2*var_j) – user158565 May 17 '17 at 17:59
  • See https://stats.stackexchange.com/questions/225734/why-isnt-variance-defined-as-the-difference-between-every-value-following-each/225758#225758 – Glen_b May 18 '17 at 00:49

1 Answers1

2

The expression is missing a factor of 2.

Using the ol' add and subtract method to incorporate the term that you want:

\begin{eqnarray} \sum_{i=1}^n \sum_{i^\prime}^n \left( x_i - x_{i^\prime} \right)^2 &=& \sum_{i=1}^n \sum_{i^\prime}^n \left( x_i - \bar{x} + \bar{x} - x_{i^\prime} \right)^2 \end{eqnarray}

Then we know we can factor the quadratic term because we like it that way.

\begin{eqnarray} \sum_{i=1}^n \sum_{i^\prime}^n \left( x_i - \bar{x} + \bar{x} - x_{i^\prime} \right)^2 &=& \sum_{i=1}^n \sum_{i^\prime}^n \left( \left( x_i - \bar{x} \right)^2 + \left(x_{i^\prime} - \bar{x} \right)^2 + 2 \left(x_{i^\prime} - \bar{x} \right)\left(x_{i} - \bar{x} \right) \right) \end{eqnarray}

The cross-product goes to 0.

\begin{equation} \sum_{i=1}^n \sum_{i^\prime}^n \left( \left( x_i - \bar{x} \right)^2 + \left(x_{i^\prime} - \bar{x} \right)^2 \right) = \sum_{i=1}^n \left( x_i - \bar{x} \right)^2 + \sum_{i^\prime}^n \left(x_{i^\prime} - \bar{x} \right)^2 = 2 \sum_{i}^n \left( x_i - \bar{x} \right)^2 \end{equation}

AdamO
  • 52,330
  • 5
  • 104
  • 209