Calculation of Variance — Difference between results by hand and R

Question

When I calculate the variance by hand I get something different than Rstudio.

They guy in the video however, did calculate it as I did, wrong. Why is that?

My calculations:

Observations $=1,2,3,4,5,6,7,8,9$

Calculation by hand: $$E[X] = \frac{1}{n}*\sum^n_1 x_i = \frac{45}{9} = 5 \\\text{or}\\ E[X] = \sum x_i *\frac{1}{9} = 5 \\ Var(X) = E[(X-\mu)^2]=\sum (x_i - \mu)^2* \frac{1}{9} = \frac{20}{3} \\\text{or}\\ Var(X) = \left(\frac{1}{n} * \sum x_i^2\right)-\mu^2 = \frac{20}{3}$$

However when I use the following code in R i get different results.

a <- c(1:9)
mean(a) % = 5
var(a) % = 7.5

Questions:

What is happening here\Why are the results different?
Are the formulas I used for the calculation by hand correct?

The `var` function in R estimates the sample variance, which is calculated as $$Var(X)=\dfrac{1}{n-1}\sum_i (x_i-\bar{x})^2$$. — user2974951, Feb 18 '19 at 14:16
The formula you presented looks like is estimating the *population variance*, which is similar except divided by $n$ instead of $n-1$, also using the population mean $\mu$, rather than the sample mean $\bar{x}$. Which one you choose depends on whether you have a sample or population data. As $n$ gets larger this becomes less important (in general). — user2974951, Feb 18 '19 at 14:20
I see. So $\frac{1}{n}$ for sample data and $\frac{1}{n-1}$ for the whole population data? — Jürgen Erhardt, Feb 18 '19 at 14:24
@user2974951, "sample variance" is a jargon. It [isn't](https://stats.stackexchange.com/a/16987/3277) "variance in the sample", it is [estimating population variance](https://stats.stackexchange.com/a/17893/3277). The "n-1" denominator makes this estimator unbiased (given that the mean with use is a sample mean). The "n" denominator makes the estimator biased, it is called maximum likelihood estimate of population variance. — ttnphns, Feb 18 '19 at 14:43
@ttnphns Yes thank you for the corrections, my answers were written hastily and I did not express myself exactly. Also I did not want to bother OP too much about matters of bias. — user2974951, Feb 18 '19 at 14:46

score 1 · Answer 1 · answered Feb 19 '19 at 07:41

The formula suggests the author is estimating the variance from a population, which is defined as $$Var(X)=\dfrac{1}{n}\sum_i x_i^2-\mu^2$$

However, if all you have is a sample from a population, then the unbiased formula for the population variance is defined as $$Var(X)=\dfrac{1}{n-1}\sum_i (x_i-\bar{x})^2$$

Notice the dfference of $n-1$ instead of $n$ and the sample mean $\bar{x}$ rather than the population mean $μ$.

The R function var by default estimates the variance using the second formula, as that is almost always the case in statistics (dealing with a sample rather than a population).

Calculation of Variance — Difference between results by hand and R

1 Answers1