I was hoping to demonstrate the other day that a given data set is normally distributed, and a chi-squared test seemed appropriate. I made my null hypothesis that the data set was normally distributed, and calculated a chi-squared value and thus a p-value of about 0.5. This is well above any sane significance level, and thus I fail to reject the null hypothesis. Job done, right?
But I want to look a bit more closely at that p-value of 0.5. I'm told that this means that, if the population underlying my data set was indeed normally distributed, this would be the probability that I observed the data in question. But what if I had calculated a p-value of, say, 0.2? That's still a way off any sensible significance level, but it's also far from 0.5. Would the case for the normality of the data be a bit weaker if the p-value was only 0.2? What about if it was 0.9?
The context for the above question was this: I'm trying to work out how much the sizes of potatoes will vary when all of them have been harvested from a single field. So I did the following:
- I gathered the data for all the potatoes harvested from a specific field.
- I carried out a chi-squared test to examine the normality of the data ($\chi^2 \approx 0.5$).
- I calculated a coefficient of variation ($\approx 4\frac{1}{2}\%$) for the data.
- I made a hypothesis, to be tested by examining data from other fields, that 95% of the potatoes in a given field will fall in the size range $[0.91\mu, 1.09\mu]$ where $\mu$ is the mean size for that field.
Have I committed any grave sins against statistics in the above reasoning?