Can the Empirical Rule be used as a reliable test for normality?

Question

For example, since approximately 68% of the data falls within the first +- std deviation on a normal distribution, could you just calculate the std deviation for the data and if 68% of those data points fell within that range from the mean, then you would be x% (68%?) confident the data is normally distributed? If you did this for all intervals and got results that aligned relatively close to the actual percentages, would this be an accurate test for normality? What thresholds around these percentages would still yield an acceptable normal distribution or allow you to assume normality with a high confidence?

What do you mean by "*accurate*" here? Are you referring to achieving a type I error rate close to the chosen rate or are you referring to having good power relative to other tests of goodness of fit of the normal distribution? — Glen_b, Dec 26 '19 at 06:16
That was intended as a binary choice. Could you clarify a little please? — Glen_b, Dec 26 '19 at 06:33
i.e. Did you intend one, or the other or both, or something else? — Glen_b, Dec 26 '19 at 06:48
@Glen_b-ReinstateMonica sorry I misread your comment. I meant achieving a type 1 error rate close to the chosen rate. — spencer741, Dec 26 '19 at 16:10
The empirical rule applies *exactly* to many non-normal distributions and to a good approximation to a huge number of distributions, both theoretical and empirical: therein lies its value. Thinking of it as merely describing normal distributions misses the point. — whuber, Dec 26 '19 at 19:17

Glen_b · Accepted Answer · 2019-12-26T19:37:32.717

Before we get into it, it's important to keep in mind that normality is an assumption about a population, not the data itself. Data are not themselves "normal"; they might conceivably have been drawn from a population that is normally distributed.

For example, since approximately 68% of the data falls within the first +- std deviation on a normal distribution, could you just calculate the std deviation for the data and if 68% of those data points fell within that range from the mean, then you would be x% (68%?) confident the data is normally distributed?

Failing to reject a null doesn't make you confident that the null is true. For example, note that many distributions might have the same or very close to the same proportion of observations within 1 s.d. of the mean.

It's also important to understand that the behaviour in the extreme tail of the distribution (even outside 3 standard deviations) can have a large impact on the properties of things you might want to do with samples from such a population.
If you did this for all intervals and got results that aligned relatively close to the actual percentages, would this be an accurate test for normality?

There are a couple of things to note:

- it's best to avoid looking at the same information (the same observations) repeatedly (e.g. the proportion within 2 standard deviations also included the proportion within 1 standard deviation; if you're already looking at that proportion it's better to exclude it from the second thing).

- consequently, using the "empirical rule" plus the symmetry of the normal would tend to suggest looking at proportions withing intervals at integer multiples of one standard deviation either side of the mean.

Taking the "more accurate" version of the empirical rule given at Wikipedia (68.27%, 95.45% and 99.73% within 1, 2 and 3 standard deviations of the mean respectively), and using the symmetry of the normal distribution, this tells us that there's a proportion of (1-0.9973)/2 = 0.00135 to the left of 3 standard deviations below the mean (and to the right of 3 standard deviations above the mean, (0.9973 - 0.9545)/2 = 0.0214 between 2 and 3 standard deviations below the mean (and similarly that far above it), (0.9545 - 0.6827)/2 = 0.1359 between 1 and 2 standard deviations below the mean (and similarly above it) and 0.6827/2 = 0.34135 between 1 standard deviation below the mean and the mean itself (and similarly above it).

(See the image at the above link)

- The most commonly used test for that situation (proportions in different intervals) would be a chi-squared goodness of fit test, which would look at the proportions in some set of bins (e.g. yours could look at he proportions "to the left of 1 standard deviation below the mean", "within 1 standard deviation of the mean", "to the right of 1 standard deviation above the mean", or you might include a number of additional intervals as described above, or you might split the distribution up at other values using normal tables rather than keeping to the empirical rule itself). The test then combines all this information into a single measure of how far off the proportions are from what you'd expect if population were truly normal.

- when you fit parameters based on the original data (e.g. to estimate population mean and standard deviation to do the binning), you have to watch out for the impact that has on the distribution of the test statistic; it's no longer chi-squared. You need to take account of it when doing the test but this is complicated enough that I wont make an already lengthy answer considerably longer by dealing with it here. There is some discussion of it under a couple of other answers already on site.

- With "accurate" defined as attaining the correct type I error rate, you can construct tests that come fairly close to the correct type I error rate, but you wouldn't do it by placing bins at integer multiples of the standard deviation as above; for several reasons you want to have bins with close to equal proportions; even then there can be distributions other than the one in your null (in this case, other than the normal) with lower rejection rate (i.e. that you're less likely to reject than you are for the normal itself); this is called bias in hypothesis testing.

- While you can get close to the desired type I error rate with large sample size and suitable choice of bins (and accounting for the issue with parameter estimation already mentioned), there's a larger problem with power. The power of a chi-squared goodness of fit test for this purpose (at least against the sort of alternatives people are usually most interested in) is quite low; if you must do a goodness of fit test there are better choices. See our goodness-of-fit posts.
What thresholds around these percentages would still yield an acceptable normal distribution or allow you to assume normality with a high confidence?

In relation to the first thing (keeping in mind the points made in this answer), you'd need to define what's acceptable for your purposes (what you're trying to do, what would constitute not being acceptable). This particular approach is not especially informative, however -- it's a very coarse instrument.

In relation to the second thing, you can't be "confident" you have a normal distribution; you might be able to reasonably assume normality, but testing it (even with a better test) is not usually a useful idea.

An important question here is why you want to test for normality. Often there are better options.

Can the Empirical Rule be used as a reliable test for normality?

1 Answers1