Expanding initial sample when the result isn't significant

Question

This inspiring answer describes a variant of hypothesis, and I want to analysis its property further. Basically, it considers a two-sided test and interprets the $p$-value as a measure of how strong the evidence is that our estimate has the correct direction (i.e., positive or negative effect). It's also indicated that the $p$-value has something to do with the sample's signal-to-noise ratio, from which this question arises.

Its methodology is summarized below

Form a null hypothesis whose alternative hypothesis is two-tailed, e.g. $H_0: \mu = 42$ v.s. $H_1: \mu \ne 42$.
Collect some data, and merge it to the sample at hand, if any.
Calculate the $p$-value.
See if the $p$-value is less than a threshold, say 0.05.
1. If it is, reject $H_0$, and conclude the direction as indicated by the sample;
2. Otherwise,
  1. Go to the outermost 2 if you are willing to continue investigating;
  2. Otherwise, stop and declare that you can't conclude the direction without collecting more data.

From a Neyman-Pearson perspective, this is horrible as the type I error rate would be 100%. However, this is fine because equality null hypotheses on a continuous RV is almost surely false, so the error rate "conditional on it" is meaningless. Instead, this approach looks Fisherian, because the steps are decided a posterior, and it's iterative.

Now, I want to learn more about its property. It is pointed out by its author that by doing so, we avoid drawing a conclusion from a sample whose signal-to-noise ratio is too small. Intuitively it makes sense, but I'm not really familiar with the concept of "signal-to-noise ratio", so I need you to explain it in this context. For example,

What's its definition? (Wikipedia say there are alternative definitions)
How is it related to the $p$-value?
How is it related to the sample size?
What does a high S/N imply?

As a bonus, I'm curious to know the probability of concluding a wrong direction using this method, but feel free to skip this one if I'm asking too many questions.

You are describing 'sequential analysis`, which I suggest you google. Roughly speaking, after each step there are three possible 'decisions' : Accept, Reject, or take Another Observation. The sample size is a random variable, but it's expected value can be substantially smaller than for a non-sequential test with the same Type I and II errors. — BruceET, Jul 16 '19 at 19:29

Cliff AB · Accepted Answer · 2019-07-17T03:05:38.033

1.) Signal-to-Noise ratio has a well defined meaning in many engineering problems. In the posting, I was referring to the less formal way it is often used in statistical inference settings.

That is, there is a parameter we want to estimate we'll define as $\mu$. Through data collection and analysis, we get an estimator $\hat \mu$. Then the signal to noise ratio is defined as

$\frac{|\mu|}{S(\hat \mu)}$

where $S(\hat \mu)$ is the standard deviation of the estimator $\hat \mu$. Note that $S(\hat \mu)$ depends on both the variance of the observations and the number of samples in the data.

2.) Typically, we assume that our estimator follows a normal distribution and that we are doing something like a Wald statistic for our test, i.e.,

$t = \frac{|\hat \mu|}{ \hat{ S(\hat \mu)}}$

Under those assumptions, as the real signal to noise approaches 0, the distribution of the p-value approaches a uniform(0,1). As the real signal to noise get large, the distribution of the p-value becomes tightly concentrated around 0.

3.) As the sample size increases, $S(\hat \mu)$ gets smaller, thus increasing the signal to noise if we have an unbiased estimator of $\mu$.

4.) In short, a high Signal to Noise implies that it's easy to get a relatively good estimate of the parameter from the data we plan to obtain. Moreover, a high SNR implies that conditional on finding statistical significance, the probability that we actually have the direction correct is higher than a study with a lower SNR.

I'm baking quite a bit into that last point 4. If you're interested into the why, I think the concept is best represented in the literature with the idea of type S/M errors. This is looking at the probability that we got the sign of an effect incorrect (type S) or the magnitude of an effect (type M) conditional on us finding statistical significance. In short, as SNR $\rightarrow 0$, P(type S) $\rightarrow 0.5$ and P(type M) $\rightarrow 1$, where as when SNR gets large, both error rates approach 0.

If you're curious about this subject, here is a reasonable place to start.

Now, thinking back to our p-hacking problem. What happens when we fail to reject, so we collect more data and reanalyze? Well, it's true that we inflate our type I error rates. But we also reduce both our type S and type M error rates! So unless we believe that the null hypothesis could be true (and it appears that in the case of comparing two groups, Tukey did not), we shouldn't be as concerned about preserving our type I error rate as we should in minimizing our type S and M rates.

Finally, note that approaches that call for getting new data when our first experiment failed to find significance and not using the old observations increase the type S and M rates when compared with using the new data and the old data!

Thanks, this is really well done! I think we should use a more sensible null hypothesis when attacking real-world problems, e.g. $H_0: -0.001 \le \mu \le 0.001$. However, the method being discussed won't work for this one, as the null hypothesis could be true now, and we might consider [gung's suggestion](https://stats.stackexchange.com/a/417549/108877) instead. After all, different questions require different answers :) — nalzok, Jul 17 '19 at 03:28
@nalzok: ah. Yes, once you start defining the null hypothesis to be something that *could* be true (i.e., an interval instead of a single point), then type I errors become a real possibility, you would definitely want to rethink the approach. — Cliff AB, Jul 17 '19 at 04:06

Expanding initial sample when the result isn't significant

1 Answers1

Linked