0

I'm particularly interested why N appears in the denominator for the formula of population variance, but (n-1) appears in the denominator for the sample variance, where N is the total number of elements in the population and n is the total number of elements in the simple random sample?

Basically, the (n-1) seems less intuitive to me.

I'd love to know the reasoning behind this, as well as seeing any mathematical derivations that back up that reasoning.

EthanT
  • 137
  • 1
  • 9
  • 1
    see http://stats.stackexchange.com/questions/51237/population-variance-and-sample-variance and http://stats.stackexchange.com/questions/74420/standard-deviation-and-variance-in-sample-and-population-formulas-for-all/74422#74422 and https://en.wikipedia.org/wiki/Bessel%27s_correction – Glen_b Apr 05 '17 at 16:58
  • It is just because it is common to use the unbiased estimate of variance for the sample estimate. – Michael R. Chernick Apr 05 '17 at 17:24
  • @Glen_b Thanks, that was exactly what I was looking for, especially the pdf that was linked to: https://maxwell.ict.griffith.edu.au/sso/biased_variance.pdf Something that was unclear to me in that paper is that I was not familiar with the "iid assumption" and why that means E{xi*xj} = (mu_x)^2? – EthanT Apr 05 '17 at 17:43
  • "iid" means "independent and identically distributed". The expectation result is a direct consequence of that independence. See https://en.wikipedia.org/wiki/Independence_(probability_theory)#Expectation_and_covariance – Glen_b Apr 05 '17 at 17:47
  • Well, a couple more questions. The pdf starts by assuming a bias exists. How generally true is this assumption? Are all samples guaranteed to be biased in such a fashion that the (n=1) is a good corrector in ALL cases? From what I know in engineering, biases come in all "shapes and sizes". Could the same be said about biases found within simple random sampling? Or, regardless of origin, do all biases resulting from random sampling simply result in a shift of the mean away from the population mean, thereby making this simple correction of (n-1) sufficient? – EthanT Apr 05 '17 at 17:51
  • The n-denominator version of variance is too small (biased) in such a way that its expectation is always $(n-1)/n$ times the population variance. See here: http://stats.stackexchange.com/questions/100041/how-exactly-did-statisticians-agree-to-using-n-1-as-the-unbiased-estimator-for/100050#100050 ... it's not to do with "shifting the mean away from the population mean". It has to do with the fact that the sample mean is closer to the data (in the way that variance measures it) than the population mean is. – Glen_b Apr 05 '17 at 17:58
  • Understood. I mentioned the shift away from the population mean, as that seems to be the way the pdf paper mathematically and colloquially stated it. See eq. 7 and corresponding wording. Is that statement not generally true? – EthanT Apr 05 '17 at 21:13
  • It also seems based on the following variance property V(x) = V(x+a), that an overall shift of the mean (at least caused by adding a constant bias to all the data) would have no effect on variance and therefore standard deviation. – EthanT Apr 05 '17 at 21:22

0 Answers0