Two questions on significance testing

Question

Suppose you have a population and some measurement which you could do on each member of the population (e.g. the population could be all the people in the world, and the measurement could be height). So one can regard this measurement as a random variable $X$ on the population, with some mean $\mu$ and variance $\sigma^2$; $\mu$ is known, $\sigma^2$ may or may not be known.

Now suppose you have a subset of the population, a sample of size $N$, and you wish to know whether these people are significantly different than the overall population with respect to this measurement. You can measure them and find the mean $\bar{x}$ and variance $s^2 = \frac 1N\sum_1^N(x_i-\bar{x})^2$ where the $x_i$ are the individual measurements of the people in your sample. One way to determine the significance of your measurements is to do the following:

Let $X_i \sim_{\mathrm{iid}} X$ for $i = 1, 2, \dots, N$ and let $Y = \frac 1N\sum_1^NX_i$. Estimate a distribution for $Y$. Based on this estimate, determine the probability $P(|Y - E[Y]| > |\bar{x} - E[Y]|)$ and if this probability is larger than some predetermined threshold, then you reject the null hypothesis (which in this case would've roughly captured the hypothesis that your sample population is not different from the overall population).

My questions are about the estimated distribution for $Y$. The Central Limit Theorem says that if $N$ is large, then we may assume $Y$ is normally distributed. But if $N$ is small, we're supposed to use Student's t-distribution [Disclaimer: I'm sure it's more complicated than that, but this is what I'm supposed to teach my students so I need to know the reason that this might be a reasonable thing to teach them]. So my first (multi-part) question is: What is the conventional cutoff for small $N$/large $N$, why is that cutoff conventionally accepted, and why wouldn't we just always use Student's t-distribution even for large $N$?

Once we know what kind of distribution to use, we still need to know the parameters. It's not hard to see that $Y$ will have mean $\mu$ and variance $\frac{\sigma^2}{N}$. Now if $\sigma^2$ isn't known, then we estimate it by $\hat{s}^2 = \frac{N}{N-1}s^2$. So my next (multi-part) question is: Why precisely the $\frac{N}{N-1}$ factor, and is there ever a case where we would use $\hat{s}^2$ even if $\sigma^2$ were known?

Max Gordon · Answer 1 · 2011-11-16T21:22:51.497

The n-1 is a result of the Bessel's correction. The link has the proof but my own intuitive understanding of it is:

The $\sigma$ is a result of a difference between two values and if you have only one observation your mean is that observation, in other words - you can't estimate a $\sigma$ from 1 person. This means that you need the combination of two individuals to get 1 $\sigma$ measure - hence you only have n-1 possibilities (degrees of freedom).

Regarding a cutoff for the t-test I haven't encountered any rules of thumb. I have a hard time imagining an experiment where it matters when there are more than 40 studied subjects. I work with uncertainties daily and no patient of mine cares if the risk for re-operation is 3-5% or 2-6%. I guess a good rule is when you think that the chance of any possible study bias in your study is bigger than the added uncertainty of the t-test --> you have bigger issues to worry about than the type of test.

Minor update

I've looked at your math but I'm not sure what your aim is. If your teaching stats: keep it simple... Statistics is all about living with uncertainties - when I've taught the subject - most experience the mathematical exactness and this uncertainty very unsettling :-)

Using the true $\sigma^2$ no sense to me - why would you study something you already know? Even if your study group is part of a large known sample - you can't be sure that the $\sigma^2$ is the same even if they are very similar.

Concerning the known $\sigma^2$ question, one can imagine cases where this becomes interesting and important. Suppose (say) you are a physicist taking measurements of the speed of light *in vacuo* in order to test a theoretical prediction. You are employing a measurement apparatus known, through long experience, calibration, and theory, to have a measurement error standard deviation of 0.01 m/sec. When comparing your measurements to the prediction, should you use the known value of $\sigma^2$ or the estimated variance from the sample? A good case can be made for using the known value. — whuber, Nov 16 '11 at 15:23

gung - Reinstate Monica · Accepted Answer · 2011-11-16T23:39:28.003

@MaxG and @Conjugate-Prior have both put forward what I consider to be good answers. I will add a few more small thoughts.

First, there is no 'bright line' between small-n and large-n. That is to say, if you want to be hyper-technical, you should always be using the t-distribution. However, around N=100, the t-distribution becomes virtually indistinguishable from the normal. At that point it no longer makes a difference which you use. Some people will say this at N=50; that's fine with me, since this is just a judgment call and there is no bright line. I suppose if 'significance' is all that's important to you, and it is hanging on whether you use the t-distribution or the normal, then that should tell you something.

Although I don't know anything about the class you are going to teach, I want to second @MaxG's note about using mathematical precision. In practice, it often makes students anxious and less able to grasp the point you are trying to make. This depends on your students, of course, and it's sad, I suppose, but it's still often true. I have taught introductory classes with highly math-phobic students; I always try to explain things as conceptually as possible. With that in mind, here's the way I typically explain it: The problem with the population variance formula is that it's biased (note that the reason for this is given by @Conjugate-Prior). Specifically, it yields estimates that, on average, are too small. However, the degree of bias diminishes as your sample size increases. So we need a way to compensate for that. Dividing the average squared deviation by N-1, instead of by N, yields a larger value. Because, for any division problem, if the numerator is held constant, the quotient increases as the denominator decreases. Moreover, the amount of the adjustment in the estimated variance diminishes as the sample size goes up. So this approach meets our criteria. (I make clear that this is not the 'real' answer in the mathematical sense, but that I explain it this way so that it's easier to understand. Furthermore, I illustrate this with a simulation of samples drawn from a population using the population and sample formulas, and also by working a couple of division problems and variance estimation problems. Even innumerate students can get it when done this way.)

For your third question, @whuber's comment to @MaxG's answer is excellent.

Best of luck with your class.

score 1 · Answer 3 · answered Nov 16 '11 at 18:07

1

Think of the N-1 as a special case of the number of independent observations minus the number of linear constraints you assert about them.

In the familiar situation where we estimate both a mean and a variance of a univariate distribution, the independent measurements are the N sample data points, and the linear constraint is your estimate of the unknown mean. If that is not intuitive, note that asserting the mean of a N numbers to be $m$ is the same as forcing them to add up to $m \times N$. At most N-1 of the original numbers can vary while fulfilling this sum constraint, so there are really only N-1 bits of information left to tell you about the standard deviation. Equivalently, but perhaps less intuitively, as this page puts it, the sum of the squared deviations from the estimated mean must be equal to 0.

With this schema in mind it should be clear why the correction is not always N-1. Slightly more generally, it will be N-K where K is the number of other 'free parameters' in the model.

answered Nov 16 '11 at 18:07

conjugateprior

19,431
1
55
83

This is a good heuristic, but it's not quite correct in the generality you claim here. We discussed the issue of [degrees of freedom](http://stats.stackexchange.com/questions/16921/how-to-understand-degrees-of-freedom/17148#17148) recently. In the present context that demonstrates why your argument (although suggestive) does not really show how "precisely the $\frac{N}{N-1}$ factor" appears. – whuber Nov 16 '11 at 19:36
That is an interesting and important discussion you provide over in the DoF answer. However, my simplified answer was driven by two related aspects of this question: 1. The questioner asked an intro teaching question, hence got an intro teaching answer i.e. a clear intuition that mostly works, and in particular works on the examples he offered; unless I misunderstand your DoF answer the examples the questioner gave do all fulfil the constraints you had to violate to make your DoF point. (or did I miss something? Quite possible...) – conjugateprior Nov 17 '11 at 10:22
1

2. My deliberately hesitant '_slightly_ more generally' was intended to cover both the complications that arise from the relevant approximation and scope limitations, and from the extensions in the second half of the wikipedia article. Finally, I deliberately didn't actually use the phrase 'degrees of freedom' and put scare quotes around 'free parameters' because I think both are pretty unhelpful terminology. – conjugateprior Nov 17 '11 at 10:22

Two questions on significance testing

3 Answers3