Puzzling contradiction in introductory textbook regarding standard deviation and sample size

Question

I am reading Facts From Figures by MJ Moroney. In one part of the textbook (chapter 10, pages 136-137) he writes of samples taken from a population with a normal distribution, that their "standard deviation decreases as the square root of n, the number of items in the sample". He gives a formula where "the standard deviation for the distribution of the individual items" is divided by the square root of the number in the sample to give "the standard deviation for the averages of samples of n items". His example is a population of men with an average weight of 140 pounds with a standard deviation of 20 pounds, and he calculates that the standard deviation of the average weight of a group of four men at a time would be 10 pounds, and for a group of one hundred men it would be two pounds.

But later in the book (chapter 13, pages 225-227), he writes that the expected value of the variance in a sample of n items is obtained from the population variance by multiplying the population variance by ((n-1)/n). (The Bessel correction.)

I am aware that variance is the square of standard deviation, but after doing some simple algebra to convert the first equation described above to deal with variance rather than standard deviation, then for a sample size of 100 the conversion factor in the first equation is 100, but in the second equation the conversion factor is about 1. A considerable difference.

Please can someone explain my misunderstanding. Surely the standard deviation of a population describes the standard deviation of the individuals within that population.

The standard deviation of a set of data differs from the standard deviation of means for repeated samples from a population. The second kind of standard deviation is often called _standard error_; a good reason for doing that is to make this distinction. — Nick Cox, Apr 19 '18 at 11:07

score 0 · Answer 1 · answered Apr 19 '18 at 11:35

First of all there is the bad habit of writing $N(\mu, \sigma^2)$ (which is inconsistent with the multivariate Gaussian $N(\mu, \Sigma)$ [without a square]) that might cause trouble so let us just speak of $\text{Var}(X)$ as the variance of a random variable and $\text{sd}(X) = \sqrt{\text{Var}(X)}$ its standard deviation. These are concepts that are completely general and do not have anything to do with normal variables/normal distributions. Let us assume that $X$ and $Y$ are two independent normally distributed random variables. Then the rules are (see https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables):

The sum $X+Y$ is again normally distributed and we have

$$\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y)$$ and $$\text{Var}\left(\frac{X}{\alpha}\right) = \frac{\text{Var}(X)}{\alpha^2}$$

The process as he describes it is as follows: Assume that we are given an infinite number of iid random variables $X_1, X_2, ...$ all $N(0,\sigma^2)$ distributed. We choose a single $n \in \mathbb{N}$ and 'sample often' (i.e. for $k=1,...,K$ and $K$ is a large number but the samples as it still being independent) samples $x_1^{(k)}, ..., x_n^{(k)}$ and form the numbers $\mu^{(k)} = \frac{\sum_{i=1}^n x_i^{(k)}}{n}$ and 'draw' the distribution of the values $\mu^{(k)}$ (for example, by drawing histograms with small bar widths). Then we will see that this distribution tends to become a normal distribution $N(0, [\sigma/\sqrt{n}]^2)$. Since the samples are still independent from each other we can also describe this process more simply as forming the new random variable $$Z = \frac{\sum_{i=1}^n X_i}{n}$$ (taking the mean of $n$ samples of the original population). Since the $X_i$ are iid normally distributed we can apply the rules above and get that $Z$ is normally distributed again as well and that \begin{align*} \text{Var}(Z) &= \frac{\text{Var} \sum_{i=1}^n X_i}{n^2} \\ &= \frac{\sum_{i=1}^n \text{Var}X_i}{n^2} \\ &= \frac{\sum_{i=1}^n \sigma^2}{n^2} \\ &= \frac{n \sigma^2}{n^2} \\ &= \frac{\sigma^2}{n} \end{align*} so that also $\text{sd}(Z) = \sqrt{\text{Var}(Z)} = \frac{\sigma}{\sqrt{n}}$. That is the explanation for the observation the author describes. What do we do with this? Suppose we are given a sample of size $n$ $y_1, ..., y_n$ of iid random variables $Y_1, ..., Y_n$ and suppose we know that these random variables follow a normal distribution $N(\mu, \sigma^2)$ and we do know $\sigma^2$ but we do not know the parameter $\mu$ and we want to figure this parameter out from the data. Of course we form $\tilde{\mu} = \sum_{i=1}^n x_i / n$ but then we want to know: How good is this 'guess' of the 'true value' $\mu$? I.e. how far is $\mu$ away from $\tilde{\mu}$ or -- reformulated -- how often does it happen that $\mu$ is this and that far away from $\tilde{\mu}$. We form $x_i = y_i-\mu$ and $X_i = Y_i - \mu$ so that the $X_i$ match the setup above. Then we know that $\hat{\mu} = \sum_{i=1}^n x_i/n$ is a sample from the variable $Z$ that we considered above. So we know where it is at ("90% of the times" it is not further away from $0$ than two times the standard deviation $\sigma/\sqrt{n}$ of $Z$). But this is precisely our measurement of quality because $$\hat{\mu} = \frac{\sum_{i=1}^n x_i}{n} = \frac{\sum_{i=1}^n y_i - \mu}{n} = \frac{\sum_{i=1}^n y_i}{n} - \mu = \tilde{\mu} - \mu$$

I.e. $\hat{mu}$ "is this and that often close to zero" translates into "our guess $\tilde{\mu}$ is this and that often close to the true value $\mu$". Since the "variation" (standard deviation) of $\hat{mu}$ (i.e. the distance of our guess and the true value) goes towards zero as $\sigma/\sqrt{n}$ we know that with an increasing amount of samples we get closer and closer to the true value and we can even say how close we get in terms of the sample size.

Now on the second case: Remember that we assumed that we know the initial $\sigma$? What about the case when we do not know $\sigma$ neither? Then we need to write down a guess for it (just like our $\tilde{\mu}$ was the guess for $\mu$). The first version is $$\tilde{\sigma^2} = \frac{\sum_{i=1}^n (x_i - \tilde{\mu})^2}{n}$$ but it turns out that this is not good (https://en.wikipedia.org/wiki/Bias_of_an_estimator). One needs to multiply this with the factor $$\frac{n}{n-1}$$ (which is 1 divided by the factor you wrote down in the question but I think the relation is the same):

These factors do not coincide at all because there were designed for two very different things: The first one describes how good our guess for the mean is and in the second one is a correction factor for our guess where we want to guess the variance and not the mean.

Puzzling contradiction in introductory textbook regarding standard deviation and sample size

1 Answers1