0

Consider an NHST, for example a one sample t-test for $H_0:\mu\leq0$. The test statistic is $T(x) = \frac{\bar{x} \sqrt{n}}{s(x)}$, which has $t(n-1)$ distribution. Now I make an observation $x^*$, for which I'd like to have a small p-value. The p-value using test statistic $T$ would be $P(x) = 1 - F(T(x))$, where $F$ is the CDF of $t(n-1)$.

Now suppose I find $P(x^*)$ too large, e.g., it is larger than a threshold provided by the journal I'd like to publish in (typically $0.05$). So I change my test statistic to $T'$, where $T'(x) = 100000$ if $x=x^*$ and $T'(x) = T(x)$ else. The p-value of $x^*$ with respect to $T'$ will be virtually zero, while all other properties of $T$ are maintained or approximately maintained. In particular, $T$ and $T'$ will have the same CDF.

What kind of requirement on the test statistic forbids such nonsense? I'm looking for some rigorous mathematical property, not just "don't do it, since it's unreasonable."

Addendum: To be more concrete, let's look at a binomial test instead. Say we have an urn with red and white balls, $p$ is the fraction of red balls in the urn, and $H_0 : p \leq 0.5$. We draw $n$ balls and count the number of red ones in the sample, call this number $X$. The typical test statistic would be $T(x)=x$, so that $T(X)$ has binomial distribution.

Assume there are zero red balls in our particular sample, which under normal circumstances is all in favor of $H_0$. Then we use the test statistic $T'(x) = 101$ if $x=0$ and $T'(x)=x$ else. This has a slightly modified binomial distribution, we merely shift the probability mass of $0.5^n$ from $0$ to $101$.

The p-value for a right-tailed test as ours with respect to a test statistic $S$ of an observation $x$ is the probability of the test statistic having a value of $S(x)$ or larger. Denote $P$ the p-value for $T$ and $P'$ for $T'$. For $x \geq 1$ we have $P'(x) = P(x) + 0.5^n$, which is only a minute change if $n$ is large. However, $P(0) = 1$, whereas $P'(0) = 0.5^n$.

Correction: We have to consider the distribution of $T(X)$ for all parameters in $H_0 : p \leq p_0$, not just the case $p=p_0$. If we look at $p$ close to $0$, shifting all mass from $0$ to $101$ is a big change. This clarifies it for the binomial case, but the normal case is still open.

  • 2
    (1) Your modified test is not a test at all, because it is ill-defined: it depends on what you mean by "too large." (2) Regardless, even if you fix up that problem, your calculation of the p-value for your modified test is incorrect, because whatever statistic you wind up with will not have a $t$ distribution. I think you're right to call this "nonsense." – whuber Oct 31 '17 at 20:17
  • regarding (2): why does it have to have t distribution? I added an example for a binomial test. Yes, T'(X) does not have binomial distribution, but why can't we use it as a test statistic? – Lasse Kliemann Oct 31 '17 at 20:52
  • The modified binomial test is more interesting. It's unclear where the $0.5$ in "probability mass of $0.5^n$" comes from--that number appears to have no relation to $H_0$--nor is your change "minute" or only "slightly modified" in case $p$ is close to zero, which is included in $H_0$. Regardless, the modified test is *inadmissible* with respect to any reasonable loss function (including the 0-1 loss used by NHSTs): it's provably worse than the usual test in some cases and never better. If you're unfamiliar with admissibility, then perhaps this is the concept you're after? – whuber Oct 31 '17 at 21:20
  • 1
    The binomial example also cites an interesting and oft-made misinterpretation of p-values as "the probability of the test statistic having a value of $S(x)$ or larger." This is correct for many of the better known tests but is not generally true. See https://stats.stackexchange.com/a/130772/919 for my discussion of this point (from the Neyman-Pearson perspective). – whuber Oct 31 '17 at 21:23
  • Returning to this question after a long time, I realized that T and T' in fact have the same distribution since I only modified in one point. For this case, the only reason so far for not doing this remains that the procedure is just self-delusion. – Lasse Kliemann Jan 02 '18 at 20:39

0 Answers0