3

I used RStudio to perform Shapiro-Wilk test for normality. In the following data set calling it $D$, I have performed Shapiro-Wilk test for normality with p-value=0.7698.

34.681 26.291 33.280 36.169 41.471 31.528 25.502 43.211 35.330 30.447

If I consider log transform of data set $D$(i.e. take log on each element of $D$), I perform Shapiro-Wilk test on the transformed data set $log(D)$ with p-value=0.7707.

In particular, I have both $D$ and $log(D)$ data set following normal distribution as Shapiro-Wilk test did not reject either one of them.

$Q$: How should I resolve such contradiction? Note data $D$ and data $log(D)$ cannot both be normal.

user45765
  • 765
  • 3
  • 10
  • There's no contradiction in these results: they merely show that the logarithm does little to change the distribution of a set of data with a small CV. – whuber Jan 27 '21 at 14:26

1 Answers1

1

This is an excellent example of a non-rejection not implying that the null hypothesis is true, and I will use this the next time someone asserts that.

You resolve the contradiction by noting that you only have ten observations, so the test has limited power to detect non-normality. Neither the data nor the log of the data look distinctly non-normal, but this does not mean that they are normal.

In each case, the test is shrugging its shoulders: “I can’t tell, folks; I can’t tell.”

Dave
  • 28,473
  • 4
  • 52
  • 104
  • So is there anyway out of this dilemma? – user45765 Jan 27 '21 at 14:09
  • There is no dilemma. We should back up: what you are doing is similar to procedures that are known to be invalid: that is what we should be concerned about. What statistical problem do you hope your testing will solve? – whuber Jan 27 '21 at 14:27
  • @whuber I mean do I use a different test to do hypothesis testing? If so which one? That is basically my question. I only know Shapiro-Wilk testing for normality. According to wiki, it is best for overall performance. – user45765 Jan 27 '21 at 15:22
  • @whuber Essentially, I want to test whether the data is following normal distribution which will tell me what estimators I should use. – user45765 Jan 27 '21 at 15:36
  • That's usually not a good idea: instead, you should choose your estimators based on what you want to estimate and on your model of the data. – whuber Jan 27 '21 at 18:20