TOST and its two null hypotheses

Question

Background

I have a device which is used to size potatoes. I want to use statistics to assess how accurate this device is.

To that end, I've collected two data sets, $X$ and $Y$, where $X$ is the set of measurements collected by the device for a single one-ton box of potatoes, and $Y$ is the set of measurments taken by hand for the same box. Note that, while $X$ and $Y$ pertain to the same box, the two data sets are not paired in the sense that one cannot say, for each $x_i$, that it corresponds to a given $y_i$; indeed, there are around 10% more elements in $X$ than in $Y$.

In assessing the accuracy of the device, I am very concerned with how well it reports the size profile of a given box, and not particularly concerned with how well it measures any one potato. Both $X$ and $Y$ are roughly normally distributed, with a moderate positive skew.

One component of the device is known to introduce an error, which I'll call $\pm \delta$. I would like to be able to demonstrate that the device is accurate within that margin of error.

My Progress So Far

In response to a related question, someone suggested that a Two One-Sided T-test (TOST) procedure would be useful here. So that's what I've tried to do...

My Attempt at TOST

Null hypothesis: $\bar{x} - \bar{y} < -\delta$ or $\bar{x} - \bar{y} > \delta$.

Null hypothesis, first half: $\bar{x} - \bar{y} < -\delta$.

$t_0 = \frac{\bar{x} - \bar{y} + \delta}{s_{x, y}}$

Where $s_{x, y} = \sqrt{\frac{s_x^2 + s_y^2}{n}}$

Using my numbers, I get $t_0 \approx 10$. Given that the degrees of freedom is in the thousands, I guess this means we reject the null hypothesis?

Null hypothesis, second half: $\bar{x} - \bar{y} > \delta$.

$t_1 = \frac{\bar{x} - \bar{y} - \delta}{s_{x, y}}$

Again, using my numbers, I get $t_1 \approx 30$. Given that the degrees of freedom is in the thousands, I guess this means we reject the null hypothesis?

We rejected both halves of the null hypothesis, so I guess these means we can reject the null hypothesis overall?

What I Don't Understand

While there are probably a number of things I'm not grasping properly, the thing that I don't understand and which I know I don't understand is the sidedness of the t-test. I was always taught that the null hypothesis for a t-test was $\mu_a = \mu_b$. I don't understand how you can swap that equality for an inequality, and, once we make that swap, how we know that our t-test proves that $\mu_a - \mu_b < \delta$, and not $\mu_a - \mu_b > \delta$.

I hope my answer helps you, but I do want to point out that hypothesis tests are of population parameters, not sample statistics. This is why my answer used $\mu_x$ and $\mu_y$ instead $\bar{x}$ and $\bar{y}$. — Dave, Dec 09 '20 at 23:26

Dave · Accepted Answer · 2020-12-10T16:11:53.327

$H_0: \vert\mu_x - \mu_y\vert = \delta$

$H_a: \vert\mu_x - \mu_y\vert < \delta$

In English, we want to show that the means of $X$ and $Y$ are not more than $\delta$ from one another.

Let's break $H_0$ and $H_a$ into two one-sided hypothesis tests.

$H_{0,1}: \mu_x - \mu_y = \delta$ (first null hypothesis)

$H_{a,1}: \mu_x - \mu_y < \delta$ (first alternative hypothesis)

$H_{0,2}: \mu_x - \mu_y = \delta$ (second null hypothesis)

$H_{a,2}: \mu_x - \mu_y > -\delta$ (second alternative hypothesis)

(In order for $H_a$ to be true, both $H_{a,1}$ and $H_{a,2}$ must be true, and if both both $H_{a,1}$ and $H_{a,2}$ are true, then $H_a$ is true.)

By rejecting $H_{0,1}$ in favor of $H_{a,1}$, you are saying that you believe $\mu_x - \mu_y < \delta$, so $\mu_x - \mu_y\in (-\infty, \delta)$.

By rejecting $H_{0,2}$ in favor of $H_{a,2}$, you are saying that you believe $\\mu_x - \mu_y > -\delta$, so $\mu_x - \mu_y \in (-\delta, \infty)$.

Since you believe that $\mu_x - \mu_y\in (-\infty, \delta)$ and $\mu_x - \mu_y\in (-\delta, \infty)$, you believe that $\mu_x - \mu_y\in (-\infty, \delta)\cap (-\delta, \infty) = (-\delta, \delta)$. In other words, you believe that the means of $X$ and $Y$ differ by no more than $\delta$.

$\square$

I really like a quote by gung. The bracketed parts are mine.

Very briefly, you select an interval within which you would consider that the true mean difference might as well be 0 for all you could care, then you perform a one-sided test to determine if the observed value is less than the upper bound of that interval [$H_{0,1}$ vs $H_{a,1}$], and another one-sided test to see if it is greater than the lower bound [$H_{0,2}$ vs $H_{a,2}$]. If both of these tests are significant, then you have rejected the hypothesis that the true value is outside the interval you care about. If one (or both) are non-significant, you fail to reject the hypothesis that the true value is outside the interval.

EXAMPLE

We have $X\sim N(\mu_x, 1)$ and $Y\sim N(\mu_y, 1)$. We want to show that $\vert \mu_x - \mu_y \vert < 2 $.

$H_0: \vert\mu_x - \mu_y\vert = 2$

$H_a: \vert\mu_x - \mu_y\vert < 2$

$H_{0,1}: \mu_x - \mu_y = 2$ (first null hypothesis)

$H_{a,1}: \mu_x - \mu_y < 2$ (first alternative hypothesis)

$H_{0,2}: \mu_x - \mu_y = 2$ (second null hypothesis)

$H_{a,2}: \mu_x - \mu_y > -2$ (second alternative hypothesis)

We collect $36$ observations from $X$ and $49$ observations from $Y$, so $n_x=38$ and $n_y=49$. The sample means are $\bar{x} = 3$ and $\bar{y} = 4$. Since we know the variance, we use a z-test for each one-sided test. Let's do the first test.

$H_{0,1}: \mu_x - \mu_y = 2$

$H_{a,1}: \mu_x - \mu_y < 2$

$$ Z = \dfrac{(3 - 4) - 2} { \sqrt{ \frac{1}{36} + \frac{1}{49} } } =\dfrac{-3}{0.22} =-13.6 $$

Since this is a "less than" hypothesis test, we find the lower tail probability.

1-scipy.stats.norm.cdf(-13.6)$\approx 0$

From this, we conclude that $\mu_x - \mu_y < 2$.

Let's do the second test.

$H_{0,1}: \mu_x - \mu_y = 2$

$H_{a,1}: \mu_x - \mu_y > -2$

$$ Z = \dfrac{(3 - 4) - (-2)} { \sqrt{ \frac{1}{36} + \frac{1}{49} } } =\dfrac{1}{0.22} =4.54 $$

Since this is a "greater than" hypothesis test, we find the upper tail probability.

1-scipy.stats.norm.cdf(4.54)$\approx 0$

From this, we conclude that $\mu_x - \mu_y > -2$.

Combining both tests, if $\mu_x - \mu_y$ has to be greater than $-2$ and has to be less than $2$, then $\vert \mu_x - \mu_y \vert <2$.

Your answer is helpful, but it doesn't get to the *heart* of what I don't understand, which is how the **sidedness** of each of the two t-tests makes the procedure slightly different for each one. As you say, $H_{0,1} : \mu_x - \mu_y = \delta$ and $H_{0,2} : \mu_x - \mu_y = \delta$, i.e. they're the same. Surely the *alternative* hypothesis shouldn't make any difference to how the test is carried out, only in how the result is interpreted, but then how do the two t-tests not end up being identical? — Tom Hosker, Dec 10 '20 at 01:03
For my own personal benefit, it would be really helpful if you could point me to a thorough worked example of TOST - I've definitely struggled to find one so far - or you could even carry out an example procedure in your answer. I think something like that would sweep away all my misconceptions in one stroke. (Sorry for bombarding you with unsolicited advice!) — Tom Hosker, Dec 10 '20 at 01:09
The sidedness checks if the difference is less than $\delta$ or greater than $-\delta$. The first one-sided test leads us to believe $\mu_x - \mu_y$ to be less than $\delta$, so the values above $\delta$ are ruled out as possibilities. The second one-sided test leads us to believe $\mu_x - \mu_y$ to be greater than $-\delta$, so the values below $-\delta$ are ruled out as possibilities. If e believe both of those, then we believe $-\delta < \mu_x - \mu_y < \delta$. — Dave, Dec 10 '20 at 15:18
I've always understood the **purpose** of the two tests. I've never understood how the **procedure** differs between the two. As I said above, a brief but thorough worked example would probably be the least painful means of clearing up my misconceptions. — Tom Hosker, Dec 10 '20 at 15:44
I do not understand what you mean about the procedure. You do each test as you would do any other one-sided test. — Dave, Dec 10 '20 at 15:52

TOST and its two null hypotheses

Background

My Progress So Far

My Attempt at TOST

What I Don't Understand

1 Answers1

Linked