1

I am trying to confirm that the pitch recorded by a microphone can be considered accurate. I have a sample of pitch values taken by the microphone with a constant input. The pitch played was about 196 hz, but was intentionally slightly lower by about .25 hz. I have set up a one-sample t-test with a theoretical mean of 196 hz, but the resulting p-value is extremely low.

How can I introduce a tolerance into the t-test to allow for some variance between the means and still be considered similar. What is an appropriate way to set this tolerance?

D. Cohen
  • 113
  • 3
  • If you intentionally lowered the tone, shouldn't your theoretical mean be 195.75? More importantly, I don't think a p-value is a good measure for similarity. If the test is not significant, you still did not demonstrate that it is similar. Perhaps a confidence interval would be more informative. If it is not too late to do a different experiment, you could raise and lower the pitch and demonstrate whether the reported pitch by the microphone changes accordingly using regression analysis. Right now you have only demonstrated that it reports about 196Hz when you play 196Hz. – Frans Rodenburg Feb 08 '18 at 05:04

1 Answers1

2

You might be interested in two one-sided tests for equivalence (TOST), which allows you to pose a null hypothesis of difference by some tolerance (let's say, $\Delta$), and some preferred type I rejection rate $\alpha$, and reject this null in favor of evidence of equivalence (i.e. similarity)

If the typical one-sample t test for difference has $H^{+}_{0}: \mu- 196 = 0$ (the super-scripted '+' means 'positivist null hypothesis') with the alternative $H^{+}_{A}: \mu- 196 \ne 0$, then the corresponding 'negativist null hypothesis' is $H^{-}_{0}: |\mu- \mu_{0}| \ge \Delta$ (where $\Delta$ is the minimum relevant difference between the $\mu$ of your sample's parent distribution, and 196 that you would find meaningful. $\Delta$ is your choice as the researcher). The alternative to $H^{-}_{0}$ is $H^{-}_{A}: |\mu- \mu_{0}| < \Delta$. The absolute value bars mean this general negativist null hypothesis can be interpreted as two specific nulls:

$H^{-}_{01}: \mu- \mu_{0} \ge \Delta$; with $H^{-}_{A1}: \mu- \mu_{0} < \Delta$

$H^{-}_{02}: \mu- \mu_{0} \le -\Delta$; with $H^{-}_{A2}: \mu- \mu_{0} > -\Delta$

If you reject $H^{-}_{01}$, then you must conclude that $\mu - 196$ is less than $\Delta$. If you reject $H^{-}_{02}$, then you must conclude that $\mu - 196$ is greater than $-\Delta$. However, if you reject both If you reject $H^{-}_{01}$ and If you reject $H^{-}_{02}$, then you must conclude that $\mu - 196$ is on the interval between $-\Delta$ and $\Delta$… that is, you conclude that the difference is not big enough for you to care about.

The test statistic for $H^{+}_{0}$ is the familiar $t = (\bar{x} - 196)/s_{\bar{x}}$. You can construct two corresponding t test statistics for $H^{-}_{01}$ and $H^{-}_{02}$ thus:

$t_{1} = \frac{\left[\Delta - \left(\bar{x} - 196\right)\right]}{s_{\bar{x}}}$

$t_{2} = \frac{\left[\left(\bar{x} - 196\right) + \Delta \right]}{s_{\bar{x}}}$

I like to construct (and teach) the t test statistics this way, so that the direction of the tails when looking up p-values is unambiguous and unconfused: both of these t test statistics use upper-tail probabilities for $\nu = n-1$ degrees of freedom:

$p_{1} = P\left(T_{\nu} \ge t_{1}\right)$

$p_{2} = P\left(T_{\nu} \ge t_{2}\right)$

Both $H^{-}_{01}$ and $H^{-}_{02}$ are rejected at the $\alpha$ (not $\alpha/2$) level. Only if you reject both $H^{-}_{01}$ and $H^{-}_{02}$ do you reject $H^{-}_{0}$, and conclude that the difference must lie on the interval between $-\Delta$ and $\Delta$ at the $\alpha$-level of significance, and for the $\Delta$ threshold for relevance/equivalence.

One final thing: confirmation bias means that you only look for evidence in a single direction. So the hella dope thing to do is to look both for evidence of difference with the classical one-sample t test, and for evidence of equivalence with TOST, and to combine the results in your conclusions. Doing this is called relevance testing, and gives four possible results:

  • Reject $H^{+}_{0}$ and fail to reject $H^{-}_{0}$: conclude relevant difference.

  • Fail to reject $H^{+}_{0}$ and reject $H^{-}_{0}$: conclude equivalence.

  • Reject $H^{+}_{0}$ and reject $H^{-}_{0}$: conclude trivial difference (yes, there's a difference, but a priori you said you don't care about differences that small... in other words your test for difference is over-powered for your preferred effect size of $\Delta$, which may speak to the concern you mentioned about the very small p-value in the test for difference).

  • Fail to reject $H^{+}_{0}$ and fail to reject $H^{-}_{0}$: conclude indeterminate (your data are too under-powered to draw any conclusions one way or another).

References

Schuirmann, D. A. (1987). A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics, 15(6):657–680.

Alexis
  • 26,219
  • 5
  • 78
  • 131