Questions tagged [tost]

TOST, an acronym for Two One-Sided Tests, is a straightforward way of constructing a test of the "negativist" null hypothesis that two population statistics differ by no more than a small researcher-selected equivalence threshold.

Hypothesis tests are most commonly framed in terms of null hypotheses of no difference (e.g. two parameters are equal, and their difference is zero, or their ratio is one). In such a case an extreme enough test statistic (i.e. one which is improbable if the the null hypothesis is true) provides evidence that the null hypothesis is false, and one concludes that one has evidence that the parameters are different.

By contrast, one may wish to frame a null hypothesis of difference at least as large as a given level, and here an extreme enough test statistic provides evidence to reject this null hypothesis, and conclude that one has evidence the parameters are equivalent within the given tolerance. With respect to t and z type tests, the general form of the negativist null hypothesis is $\text{H}^{-}_{0}\text{: }|\theta| \ge \Delta$, which takes the specific form to two one-sided null hypotheses: $\text{H}^{-}_{01}\text{: }\theta \ge \Delta$ or $\text{H}^{-}_{02}\text{: }\theta \le -\Delta$. If one rejects $\text{H}^{-}_{01}$, then $\theta$ must be less than $\Delta$, and if one rejects $\text{H}^{-}_{02}$ then $\theta$ must be greater than $-\Delta$. If one rejects both these one-sided null hypotheses, then $-\Delta < \theta < \Delta$.

The two t test statistics corresponding to these specific null hypotheses are (the corresponding z test statistics would naturally use $\sigma_{\theta}$):

  1. $t_{1} = \frac{\Delta - \theta}{s_{\theta}}$

  2. $t_{2} = \frac{\theta + \Delta}{s_{\theta}}$

The rejection regions for both these statistics are in the right tail, and both tests must be rejected in order to conclude equivalence. The probability of a Type I error is made by conducting both tests at the $\alpha$ level, rather than the $\alpha/2$ level, because the rejection regions of the null hypotheses are non-overlapping.

The equivalence threshold $\Delta$ is expressed in the same units as the measures being tested. However, it may be desirable to express equivalence in terms of the test statistic itself, and this can be done by using $\varepsilon$, where $\varepsilon=\Delta/s_{\theta}$. The meaning of the equivalence threshold when using $\varepsilon$ is then "how far past the rejection boundary for the test for difference $\text{H}^{+}_{0}$ a test statistic needs to be to be considered relevant." In this case, $\text{H}^{-}_{0}\text{: }|T| \ge \varepsilon$, so that $\text{H}^{-}_{01}\text{: }T \ge \varepsilon$, $\text{H}^{-}_{02}\text{: }T \le -\varepsilon$, and $t_{1} = \varepsilon - t$ and $t_{2} = t + \varepsilon$, where $t=\theta/s_{\theta}$. Note that if $\varepsilon \le t_{1-\alpha}$, then it is not possible to reject any $\text{H}^{-}_{0}$, because either $t_{1}$ or $t_{2}$ will be less than or equal to zero.

36 questions
54
votes
4 answers

Why do statisticians say a non-significant result means "you can't reject the null" as opposed to accepting the null hypothesis?

Traditional statistical tests, like the two sample t-test, focus on trying to eliminate the hypothesis that there is no difference between a function of two independent samples. Then, we choose a confidence level and say that if the difference of…
41
votes
8 answers

How to test hypothesis of no group differences?

Imagine you have a study with two groups (e.g., males and females) looking at a numeric dependent variable (e.g., intelligence test scores) and you have the hypothesis that there are no group differences. Question: What is a good way to test…
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
13
votes
2 answers

Is there a simple equivalence test version of the Kolmogorov–Smirnov test?

Has two one-sided tests for equivalence (TOST) been framed for the Kolmogorov–Smirnov test to test the negativist null hypothesis that two distributions differ by at least some researcher-specified level? If not TOST, then some other form of…
Alexis
  • 26,219
  • 5
  • 78
  • 131
12
votes
1 answer

Null hypothesis of equivalence

Suppose $X_1, X_2, \, ... \, , X_n$ are a simple random sample from a Normal$(\mu,\sigma^2)$ distribution. I'm interested in doing the following hypothesis test: $$ H_0: | \mu| \le c \\ H_1: |\mu| > c, $$ for a given constant $c > 0$. I was thinking…
Vic101
  • 121
  • 3
11
votes
2 answers

Can we accept the null in noninferiority tests?

In a usual t-test of means, using the usual hypothesis testing methods, we either reject the null or fail to reject the null but we never accept the null. One reason for this is that if we got more evidence, the same effect size would become…
Peter Flom
  • 94,055
  • 35
  • 143
  • 276
9
votes
1 answer

Intuitive explanation of differences between TOST and UMP tests for equivalence

Hypothesis tests for equivalence differ from the more common hypothesis tests for difference. In tests for difference, the null hypothesis is some form of "separate quantities are the same", and extreme enough evidence prompts rejection in favor of…
Alexis
  • 26,219
  • 5
  • 78
  • 131
9
votes
3 answers

Equivalence tests for non-normal data?

I have some data that I can't necessarily assume to be drawn from normal distributions, and I would like to conduct tests of equivalence between groups. For normal data, there are techniques like TOST (two one-sided t-tests). Is there anything…
9
votes
2 answers

Equivalence testing - tost method - why CI of 90%?

In testing for equivalence via the two one-sided test approach with confidence intervals, a (1–2α) × 100% confidence interval is calculated to check for equivalence. I assume this is because you calculate a CI for mean of group a and mean of group…
00schneider
  • 1,202
  • 1
  • 14
  • 26
8
votes
1 answer

Can you use the Kolmogorov-Smirnov test to directly test for equivalence of two distributions?

There has been talk on other questions of how one might use the Two One-Sided Tests (TOST) approach for the Kolmogorov-Smirnov (KS) test, but I was wondering whether it was possible to directly use the test statistic to show that two distributions…
7
votes
1 answer

equivalence test - why isn't it more common?

Often in frequentist hypothesis testing, the null hypothesis is of the form: $H_0: \theta = 0$ I've seen many posts about how the p-value when doing tests against this null hypothesis is just a measure of sample-size in some sense, since in reality…
7
votes
3 answers

What is the best method of reporting multiple tests of equivalence?

I am doing a study which will involve multiple tests of equivalence. Is there a standard table for reporting such results? EDIT with more detail: It is a longitudinal study with 5 time points. Our hypothesis is that there is a change from…
Peter Flom
  • 94,055
  • 35
  • 143
  • 276
6
votes
1 answer

Use of Wilcoxon test for non-normal data akin to Two One Sided T-test

I'm analysing paired data for equivalency and it's not normally distributed, i.e., the difference of the paired result is not normally distributed due to, amongst other things, outliers. If it were normally distributed I would use a Two One Sided…
6
votes
1 answer

Equivalence test for binominal data

I want to apply an equivalence test on my sample to infer whether they are equivalent or not. Since my data are bionominal [0,1] I don’t know whether the TOST procedure (tost() in R) can handle my problem or not. My data consists of two groups (G1…
nahid khosh
  • 99
  • 1
  • 10
5
votes
1 answer

Point hypothesis and equivalence hypothesis at once

Sometimes researchers (especially in collaborations) have two opposite but sound theories: There is a difference between two groups or there is only negligible difference. Now they ask their statistician. How should he approach the situation? If he…
Horst Grünbusch
  • 5,020
  • 17
  • 22
4
votes
1 answer

TOST and its two null hypotheses

Background I have a device which is used to size potatoes. I want to use statistics to assess how accurate this device is. To that end, I've collected two data sets, $X$ and $Y$, where $X$ is the set of measurements collected by the device for a…
Tom Hosker
  • 267
  • 1
  • 7
1
2 3