2

I took a 30 unit sample from a population. The sample distribution resulted to be normal. Can I state that the population distribution is normal too? If so, with what level of confidence?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Nikolai Felix
  • 31
  • 1
  • 2
  • Thanks for such a quick response! The sample I refer to were 30 readings of force. Since it is a continuous data I assumed that it may be modeled with a random variable normally distributed. Additionally, I used Minitab software to run the statics summary report and it presented, among other things, the Anderson-Darling normality test results, with the P values of 0.812. – Nikolai Felix Aug 21 '14 at 18:47
  • One more thing, what I’m actually trying to do is to test a hypothesis that the two populations means aren’t too different, using a paired T-test. But, if I’m not mistaking first I have to make sure the populations are distributed normally and this is why I thought I may infer that from the sample. Sorry for any obvious errors. – Nikolai Felix Aug 21 '14 at 18:49
  • While you need normality for the t-test-statistic to have a t-distribution under the null, it's not especially sensitive to that. Further, a hypothesis test isn't an especially good way to deal with assessing normality, because it answers the wrong question ("can you detect non-normality" rather than 'is it big enough to matter' - it tends to reject in large samples, exactly when it matters least). You may benefit from searching on the various keywords in your question. – Glen_b Aug 22 '14 at 00:00
  • After reading your answers I think I definitely messed up all these concepts. Thanks for spreading for the clarification! – Nikolai Felix Aug 23 '14 at 12:31
  • The fault is unlikely to be yours. The approach you wanted to take is often suggested to beginning students in a variety of subject areas. There's a widespread culture of general advice and rules of thumb which is at best not especially useful, and which seems to have been turned into unnecessarily proscriptive recipes. – Glen_b Aug 23 '14 at 22:22

1 Answers1

2

We will need to clarify some ideas here. Your sample, being finite, cannot possibly be normal, which is infinite. Also, this quote seems relevant.


$30$ is a fairly small sample, & the Anderson-Darling test is not the most powerful test of normality to start with. You may believe that the population is normal as a result, but it certainly isn't proven. For more information on the underlying topics here, it may help you to read these:


Regarding the issue of verifying your assumptions for a $t$-test, what needs to be normal for a paired $t$-test are the differences, not the original data. You are probably good enough, the $t$-test is pretty robust anyway. However, the check then test strategy has been criticized. If you are concerned that the test may not be appropriate, it is generally better to simply use a test that doesn't rely on that assumption, in this case the Wilcoxon.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • I found your initial sentence confusing. Any sample from a normal distribution "will be finite". The issue is that if the *possible* range of the observed variable is finite (like test scores for example) then it can't be exactly normal. In this case (force), it presumably will be bounded above and below, so is arguably not exactly normal for that reason. – Glen_b Aug 21 '14 at 23:57
  • @Glen_b, if the *population* is bounded, then it cannot be truly normal (although it may be close enough), but a *sample* can never be truly normal, b/c samples are always bounded. The only meaningful questions to ask about samples are 'is this close enough for my purposes?', & 'is it reasonable to imagine the population from which this was drawn is normal?', but the sample is simply not ever normal. There is a (common) confusion here between the properties of a sample & inferences about the underlying population. – gung - Reinstate Monica Aug 22 '14 at 00:15
  • Samples only have discrete distributions (they only take $n$ different values after all); they nevertheless might be effectively samples from continuous distributions (leaving rounding of their values to a finite number of figures aside). When people say "sample A has distribution F" one generally presumes they intend "sample A comes from a population with distribution F" -- because without that presumption the assertion would make no sense. The boundedness of the *sample* doesn't enter into that. ...(ctd) – Glen_b Aug 22 '14 at 00:27
  • (ctd)... When using boundedness to assert non-normality, it's the boundedness of the *possible* values of the variable, not the observed sample from it, that's the primary issue. A sample of size two only contains two values, but that doesn't mean those two values are the limit of the possible values for the variable being observed. – Glen_b Aug 22 '14 at 00:30
  • @Glen_b, '[w]hen people say "sample A has distribution F" one generally presumes they intend "sample A comes from a population with distribution F"' is perfectly reasonable. But consider that the question asks '[t]he sample distribution resulted to be normal... [c]an I state that the population distribution is normal too?'. That isn't consistent w/ your presumption. It is clear that there is some conceptual confusion here. Also, be aware the the top 2 portions of my answer were adapted from comments & I also subsequently edited the Q. – gung - Reinstate Monica Aug 22 '14 at 00:35
  • I appreciate fully that the OP has conceptual confusion that needs to be addressed. But I'm concerned about (at least by implication) seeming to confirm the assertion that for a sample from some distribution, there's a meaningful "distribution of the sample" that is seemingly neither the ECDF (which is only discrete) nor the underlying distribution of the population from which the sample was drawn (which may be unbounded, at least conceptually). If we're not talking about either of those, what are we discussing? – Glen_b Aug 22 '14 at 00:39
  • @Glen_b, I don't understand your comment here. I did not mean to assert that the distribution of the sample differed from the ECDF. – gung - Reinstate Monica Aug 22 '14 at 00:46
  • Hmm. In that case I definitely find some of your earlier phrasing puzzling, but I think it's best if I leave it and come back another time and see if maybe I'm making more of it than is really here; it may well be I'm taking something in a way other than you intend. Thanks for all your kind responses. – Glen_b Aug 22 '14 at 01:02
  • may be what confuses me more is that the normal random variable density function involves two known population parameters- u and std dev- and how those could be known if the population is infinite ? – Nikolai Felix Aug 24 '14 at 05:36
  • In practice we never *know* those parameter values, we use our estimates. When mathematical statisticians are working on the underlying theory, they use the symbols $\mu$ & $\sigma^2$ to represent the unknown constants, or just assume that the parameter values are known for the sake of what they're working on. Either way, no sample is ever normal; the test (eg AD) assesses the probability of getting a sample as far or further from normality as yours if the sample was drawn from a true normal. The more important question is: 'is it close enough?', & the answer is probably 'yes'. – gung - Reinstate Monica Aug 24 '14 at 12:39