Known distribution for number of samples to obtain a significant t-test result?

Question

I am running a simulation in R to illustrate how with repeated testing and no fixed sample size, you are guaranteed to reject the null, even when it is true.

Start with two draws from a $N(0,1)$ distribution
Run a t-test on the data set with the null hypothesis being $\mu = 0$
If the p-value is less than 0.05, return the number of samples in the data
Otherwise, add another sample from $N(0,1)$ to the dataset and repeat.

Here's the code.

generate <- function(){
  data <- rnorm(n=1, mean = 0, sd=1)
  p <- 1
  while(p > 0.05)
  {
    data <- append(data, rnorm(n=1, mean = 0, sd=1))
    p <- t.test(data)$p.value
  }
  return(length(data))
}

Sometimes I get a small number (between 10 and 1000), sometimes I get a larger number (~50,000) and other times the functions seems to loop forever.

Is there an analytic solution to understand this distribution of sample size?

Maybe https://stats.stackexchange.com/questions/310119 will help you. Focus on Jarle Tufto's answer at https://stats.stackexchange.com/a/310295/919. — whuber, Sep 26 '20 at 21:40
Bizarre as this sounds, rejecting the null isn’t so bad. If it happens much less than 5% of the time, then your test is underpowered! What you will find is that, at every sample size, your t-test rejects about 5% of the time, and it will not be so much higher than that for large sample sizes, so I’m not sure that this idea illustrates what you want it to. — Dave, Sep 26 '20 at 21:55
Does not seem to be a practically useful investigation. Usually in practice the question is something like this: For given null hypothesis $H_0: \mu = \mu_0$ vs $H_a: \mu = \mu_a,$ with $\mu_a > \mu_0$ and assumed variance $\sigma^2,$ to find $n$ large enough that $H_0$ will be rejected, say 80% or 90% or 95% of the time. — BruceET, Sep 27 '20 at 00:36
@Bruce You might find it instructive instead to compare the approach of this question to *sequential* sampling designs, with which it is more closely aligned. — whuber, Sep 27 '20 at 12:31

Known distribution for number of samples to obtain a significant t-test result?

0 Answers0