0

Suppose I am testing whether there is a difference between the mean of two normally distributed variables X and Y, where X ~ N(u_1, 1), Y ~ N(u_2, 1). Is it theoretically correct if I compute the p-value as P(X-Y > delta_xy) if the observed delta_xy is positive, and P(X-Y < delta_xy) if the observed delta_xy is negative? Here delta_xy is the difference between the sample mean of X and Y. In other words, can I do a one-tailed test based on the direction already been observed?

It is of note that if I compute the p-value like this, the "worst" p-value I can get is 0.5, instead of 1. How to explain this?

// Following whuber's comment as an "post hoc testing", I give an example as represented in the boxplot, in which we first plot the data and see there is a difference in the mean, most likely we will do a test of mean(OJ) > mean(VC) instead of mean(OJ) != mean(VC), is this an "post hoc testing" abuse?

enter image description here

yliueagle
  • 755
  • 2
  • 6
  • 10
  • This isn't a good question for our site, because (1) the answer simply is "no" and (2) you can find many explanations in existing threads by means of a site search. Sometimes the search terms aren't self-evident though, so I have chosen a duplicate found by searching "post hoc hypothesis." – whuber Jun 03 '19 at 20:03
  • Thank you whuber. I added another example in the description and will be greateful if you can clarify – yliueagle Jun 03 '19 at 20:33
  • 1
    That's definitely *post hoc*; the direction in the hypothesis came *after you looked at the data*. Also see https://en.wikipedia.org/wiki/Testing_hypotheses_suggested_by_the_data ... Your explicit hypotheses should be in place before you even collect the data. – Glen_b Jun 04 '19 at 05:16

0 Answers0