1

If i test two hypotheses, one of which has a null that is -in fact- false and the other has a null that is -in fact- true, I want to know the probability that the first test will obtain a p value less than that of the second, given other parameters such as delta, sigma, and sample size. I am not interested in whether one, or the other. or both, or neither are above some threshold, I'm interested in the probability that one is larger than the other.

I can simulate the situation in R, and come up with a reasonable estimate that way, but I want to know how I can impute the answer exactly.

Using the program R:

> a=vector()  
> b=vector()  
> for (i in 1:1000) {  
+ ai = rnorm(108, mean = 7, sd = 20)  
+ a = c(a,t.test(ai)$p.value)    
+ bi = rnorm(108)  
+ b = c(b,t.test(bi)$p.value)  
+ }  
> q=rep(0,length=length(a))  
> q[a < b]=1  
> mean(q)  
[1] 0.978

So, I find that approximately 98% of the time, the truly alternative hypothesis yields a lower p value, but how can I impute this result. I want to prove it mathematically.

Thanks for your help,

Nick

@Nick Sabbe What I think you are saying is that if the p value from the false null hypothesis equals a, then the probability that the p value from the true null hypothesis is less than a, is a. I get that, but what if I don't know the p value from the false null hypothesis test? What if I have two p values, and I know that exactly one of them comes from a hypothesis test in which the null is -in fact- false, what is the probability that it is the lower of the two p values? Assume all I know are the population parameters: delta, sigma, and N.

nick
  • 9
  • 3

2 Answers2

4

The p-values for the true null hypothesis (Ha) should be uniformly distributed (see amongst others q10613). If your two tests are independent (which they seem to be from your example), the chance of the p-value of Hb, given that for Ha's (the non-true one) is a, is simply a.

So, if you know the distribution of a, you may be able to integrate this out to find an analytical solution. But this depends upon your alternative, and upon which test you are using (for the false null hypothesis).

Extending the comment by @Henry and abusing notation somewhat:

$p(a<b) = \int p(a<b) da = \int (1-a) da = E(1-a)$

Nick Sabbe
  • 12,119
  • 2
  • 35
  • 43
-1

Found the answer. Seems to work well except when power is low.

$$ P(X\le Y) = \int_0^1 \text{pnorm}((\delta\sqrt{N})/\sigma - \text{qnorm}(x))dx $$

Thanks for your help guys, couldn't have gotten there without you.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
nick
  • 9
  • 3
  • There are two problems with this: first, the notation is so vague as to be meaningless (what is the variable of integration?). Second, no conceivable interpretation of this is correct for the example given in the problem, which requires integration of a product of a central and a noncentral t distribution. – whuber May 19 '11 at 18:56
  • @whuber : As I describe in the question, delta, sigma, and N are known, so the variable of integration is x. I didn't know how to write this in notation, but it should be integrated from 0 to 1. As for it being correct, try it yourself, using the program R: T – nick May 19 '11 at 19:44
  • Sorry, those 499s should be 107s, they are the degrees of freedom, not that it really matters with a sample size this large. Also, delta is 7 in this example, my mistake. – nick May 19 '11 at 19:52
  • I put the limits of integration in and the variable of integration. Note that this is an approximate expression, which is why it does not exactly agree with the R code you supplied. The function you gave in your comment is not correct: you need to use a non-central T for the alternative distribution. – whuber May 19 '11 at 20:49