0

If you sample one case many times.

$$ X \sim N(\mu, \sigma)$$

If you sample n cases many times.

$$ \bar{X} \sim N\left(\mu, \frac{\sigma}{\sqrt{n}}\right) $$

If you know that $\sigma$ is at least $1.5$ then to ensure that 99.7% of the time your $\bar{X}$ is off of $\mu$ by less than 0.1 you need to ensure that the standard deviation of your sample distribution is off only by 0.1 scaled down so $3\frac{\sigma}{\sqrt{n}} = \frac{0.1}{\sigma}$ and $ n = \left(\frac{3\sigma^2}{0.1} \right)^2 $ or that your sample size is at least 4557. If you only need to be off by less than 1 than you should have about 46 samples.

It seems to me that for many cases the rule of 30 is much too small. Am I missing something here?

Firebug
  • 15,262
  • 5
  • 60
  • 127
  • Could you explain what you mean by the rule of 30? – Michael M Sep 12 '16 at 20:43
  • Are you referring to the fact that many use 30 as the minimum sample size when you can assume $\bar{X}$ is Normally distributed, regardless of the distribution of $X$? – Matt Brems Sep 12 '16 at 20:44
  • @MattBrems yes. – Molossus Spondee Sep 12 '16 at 20:44
  • 2
    The "rule of 30" is a rule of thumb about how large a sample has to be so the distribution of the sample estimates of the mean tends to a normal distribution, not about how close to the true parameter, $\mu$, are the estimates. Also, see the answers to the linked question where it's shown it can be misleading and even fallacious. – Firebug Sep 12 '16 at 20:56
  • Could you explain exactly what you mean when you say "that the standard deviation of your sample distribution is off only by 0.1 scaled down" and how this leads to your conclusion that $n\geq 4557$? – Matt Brems Sep 12 '16 at 21:34
  • Please quote the rule you're trying to apply (and credit its source) to help us know exactly what you mean, and as far as possible make it clear how this rule relates to your question. – Glen_b Sep 12 '16 at 23:48

1 Answers1

1

What you do here is a relatively common misunderstanding about shape and spread of a distribution.

Think about hypothesis tests and confidence intervals as you generally use them. Many of the most common tests and intervals rely on an assumption of Normality - that is, your data need to follow a Normal distribution in order for these tests to apply. (These might be a $z$-test or a confidence interval that uses $z^*$ as its critical value.)

Consider a quantitative variable $X$. The Central Limit Theorem states that, regardless of the distribution of $X$, if you sample from the distribution $X$ repeated times, as the number of samples $n$ gets larger, the distribution of $\bar{X}$ more closely resembles a Normal distribution. (This is called the sampling distribution of the mean.) If $X$ is Normally distributed, then $\bar{X}$ will always be Normally distributed. However, if $X$ is not Normally distributed, then the distribution of $\bar{X}$ approaches a Normal distribution as $n$ gets larger. Relying on the Central Limit Theorem, various references state that a minimum sample size of 30 (you may also see 20 or 25, but we'll assume 30 here) is necessary for the distribution of $\bar{X}$ to be close enough to a Normal distribution, which you refer to here as the "Rule of 30." If the number of samples you collect is at least 30, it's reasonable to assume that $\bar{X}$ follows a Normal distribution and then you can construct hypothesis tests or confidence intervals that rely on the Normality of $\bar{X}$.

The importance of the "Rule of 30" is so that we can use these hypothesis tests or confidence intervals that rely on the fact that $\bar{X}$ has the shape of a Normal distribution. It is good to know that the standard error of $\bar{X}$ is $\frac{\sigma}{\sqrt{n}}$, but the necessary condition is that the shape of the distribution of $\bar{X}$ is Normal, otherwise inferences from your tests/intervals are invalid.

Now, for practical reasons, perhaps a confidence interval with small $n$ is not of much use to you. All else held equal, it is better to have a higher $n$ than smaller $n$. But the "Rule of 30" simply lets you know that a confidence interval of the form $\bar{x}\pm z^*\frac{\sigma}{\sqrt{n}}$ (or a hypothesis test requiring a Normal distribution) is valid if $n\geq30$.

Matt Brems
  • 2,588
  • 1
  • 11
  • 14
  • You seem to agree implicitly that this "Rule of 30" is valid. But do you really? The OP suggests it is not universally applicable and appears to be asking for information about that. – whuber Sep 12 '16 at 21:26
  • My interpretation, unless I'm missing something (quite possible!), is that the OP is concerned about the spread of the distribution rather than the shape of the distribution. – Matt Brems Sep 12 '16 at 21:35