Equivalence testing and difference-in-difference measures

Question

I have some data from a recent experiment that requires slightly more sophisticated testing than I'm used to. Any advice would be most appreciated!

The setup: There are control (C) a treatment (T) samples. For exach sample, approximately half was measured for how much they trust a certain category of people 'A' (call this measure '$ta$') and the other half for how much they trust another type 'B' ($tb$). The measures are discrete, in an equally spaced step function in eight segments.

One social theory has suggested that under such a treatment, trust measures should go down towards both A and B. So one should find that $ta(T)$ is less than $ta(C)$ and $tb(T)$ is less than ta(C). Moreover, it should also be that the difference towards the two groups in the treated, $ta(T)-tb(T)$, should be larger than the difference $ta(C)-tb(C)$. Each observation only has one measure; i.e. a treated individual only has either $ta(T)$ or $tb(T)$.

I hope to disprove these claims. A t-test or Mann-Whitney of (e.g.) $ta(T)$ versus $ta(C)$ fails to reject the null (where $H_0: ta(T)=ta(C)$). But as this could just be an underpowered result, I want to test against a null that ta(C) is greater than ta(T) (and then the same for tb). That is: $H_0^{(a)}: ta(C)> ta(T)$, $H_0^{(b)}: tb(C)> tb(T)$. So my first question: how would I perform this test of equivalence? I tried this with the ttost package, but my problem is that there is no prior data to give an idea of what an acceptable effect size delta would be. Because this is a study measuring trust, it's harder to come up with a sensible magnitude of our measure (as compared I guess to a pharmacological test).

My second question relates to the difference-in-difference problem. How would I go about testing either equivalence or non-equivalence for the difference in the gap between ta and tb, for each sample? For the more routine case where $H_0: ta(C)-tb(C)=ta(T)-tb(T)$, I ran a regression on the trust variable with an interaction regressor that was the product of the dummy for A and B, and the dummy for C or T, and tested its significance. Is that correct? For the more difficult test of equivalence, how would I run a test with the null $H_0: ta(C)-tb(C)>ta(T)-tb(T)$, given that each individual falls into only one of these four groups, the groups are of different sizes, and I still can't use a prior estimate for $|\Delta|$? (As mentioned, the control samples and treatment samples are broken roughly into half with respect to A and B).

Sorry for the long question, and also if anything is unclear. It's my first post! :)

David, would you be so kind as to draw out the equations for the test statistic of your positivist null hypothesis (i.e. $H^{+}_{0}: (ta_{T} - tb_{T}) - (ta_{C} - tb_{C}) = 0$)? I would be happy to provide an answer for your testing your negativist null hypothesis (e.g. $H^{+}_{0}: |(ta_{T} - tb_{T}) - (ta_{C} - tb_{C})| \ge \Delta$), but want to have the test statistic down pat first. Also: one needs to be careful for nonparametric tests for equivalence: one can either define the equivalence threshold in terms of rank sums (which is meaningless), or in terms of the test statistic. — Alexis, Mar 09 '16 at 18:55
Thanks for the (very polite!) comment. I've tried to clarify and also made things a bit neater with tex language. — David Smerdon, Mar 09 '16 at 20:02
You are very welcome. What is the test statistic you used to test for difference (i.e. when you say "fails to reject," what test statistic? AFAIU DID can be estimated in different ways, so I want to speak to how you are estimating and testing DID). — Alexis, Mar 09 '16 at 20:31
For the direct comparison of sample means: $H_0:ta(C)=ta(T)$: the t-statistic for $\hat{x}_C-\hat{x}_T$ is 1.18 (dof: 221). $H_0:tb(C)=tb(T)$: the t-statistic for $\hat{x}_C-\hat{x}_T$ is 0.70 (dof: 235). For the DID, I ran a regression with dummies for C/T, for A/B, for the interaction (1 if T, 1 if A) and about a dozen covariates. The t-statistic on the interaction was 1.21, with $P>|t|=0.228$. Is that what you meant? — David Smerdon, Mar 10 '16 at 14:11
Sure is! One more thing, and I will write an answer: can you provide your regression command or your regression equation? — Alexis, Mar 10 '16 at 20:01
The Stata command is: `reg SEND pt partner town group i.age gender i.educ i.occ i.ind i.fam born_AUS years_local i.religion if role==1`. `SEND` is the trust measure, an individual in T has $town=1$, an individual sending to B has $partner=1$, and $pt=town*partner$. The other regressors are normal exogenous controls. The t-stat of 1.21 I mention above was for $pt$. — David Smerdon, Mar 11 '16 at 09:57
I should mention that `SEND` is both left- and right-censored. For the moment I didn't take that into account, though I guess I'll eventually have to if I want to deal with the normal ANOVA assumptions of normality, right? — David Smerdon, Mar 11 '16 at 10:08

Alexis · Accepted Answer · 2020-12-13T03:40:12.750

As I understand it, your model is:

$$\text{SEND} = \beta_{0} + \beta_{pt}pt + \beta_{partner}partner + \beta_{town}town + \mathbf{B}_{controls}\mathbf{controls} + \varepsilon$$

So your estimated effect of treatment on SEND is given by $\widehat{\beta}_{pt}$. Tests for difference are reported in the vanilla output for linear regression in Stata:

To the right of $\widehat{\beta}_{pt}$ in the Stata output is "Std. Err.", or $\widehat{\sigma}_{\beta_{pt}}$.
To the right of the standard error of the estimate $\widehat{\beta}_{pt}$ is a t test statistic $\left(t= \frac{\widehat{\beta}_{pt}}{\widehat{\sigma}_{\beta_{pt}}} \right)$
To the right of the $t$ statistic is the corresponding p-value—$P\left(|T|\ge |t_{\nu}|\right)$—where the degrees of freedom $\nu=n-$no. of parameter estimates (including $\beta_{0}$).
(To the right of all these is the 95% CI.)

You can formulate an equivalence test for $\beta_{pt}$ (or any of the parameter estimates) in two ways: in units of the parameter (e.g. the slope of $pt$ vs. $SEND$), or in units of the $t$ distribution. Using Stata's tostti command (see the tost package) you specify units of the parameter using the eqvtype(delta) option, and specify units of the $t$ distribution using the eqvtype(epsilon) option.

#Formulating a test for equivalence in terms of $\Delta$:

The general negativist null hypothesis is $H^{^{–}}_{0}: |\beta_{pt}| \ge \Delta$, (i.e. $\beta_{pt}$ is equivalent to $0$ within an equivalence threshold of $\Delta$) with $H^{^{–}}_{\text{A}}: |\beta_{pt}| < \Delta$, and the corresponding specific null hypotheses for two one-sided tests are:

$H^{^{–}}_{01}: \beta_{pt} \ge \Delta$, with $H^{^{–}}_{\text{A}1}: \beta_{pt} < \Delta$, and
$H^{^{–}}_{02}: \beta_{pt} \le –\Delta$, with $H^{^{–}}_{\text{A}2}: \beta_{pt} > –\Delta$

The corresponding test statistics for these two null hypotheses are:

$t_{1} = \frac{\Delta - \widehat{\beta}_{pt}}{\widehat{\sigma}_{\beta_{pt}}}$, and
$t_{2} = \frac{\widehat{\beta}_{pt} + \Delta}{\widehat{\sigma}_{\beta_{pt}}}$

These test statistics are both constructed to be upper tail tests, so:

$p_{1} = P(T>t_{1\nu})$, and
$p_{2} = P(T>t_{2\nu})$.

You reject $H_{0}^{^{–}}$ only if both $p_{1}\le \alpha$, and $p_{2} \le \alpha$, and if you did, would conclude that your found evidence that $\beta_{pt}$ is equivalent to $0$ within $\pm \Delta$ at the $\alpha$ level of significance.

You can conduct this test for equivalence using tostti in Stata: tostti #obs #mean #sd 0, eqvtype(delta) eqvlevel(#), where:

#obs is $n-$no. of variables in your regression model (I think I count 13 in your case?)... basically it's the degrees of freedom+1.
#mean is $\beta_{pt}$
Updated: (I forgot that tostti expects the SD, not the SE) #sd is $\widehat{\sigma}_{\beta_{pt}}\times\sqrt{n}$
The # in the eqvlevel option is your value of $\Delta$ (I am assuming you want a symmetrical equivalence region, if not, check out the help file's uppereqvlevel() option). See my remarks on specific values of $\Delta$ below.

#Formulating a test for equivalence in terms of $\varepsilon$:

The general negativist null hypothesis is $H^{^{–}}_{0}: |t| \ge \varepsilon$, (i.e. $t$ is equivalent to $0$ within an equivalence threshold of $\varepsilon$) with $H^{^{–}}_{\text{A}}: |t| < \varepsilon$, and the corresponding specific null hypotheses for two one-sided tests are:

$H^{^{–}}_{01}: t \ge \varepsilon$, with $H^{^{–}}_{\text{A}1}: t < \varepsilon$, and
$H^{^{–}}_{02}: t \le –\varepsilon$, with $H^{^{–}}_{\text{A}2}: t > –\varepsilon$

The corresponding test statistics for these two null hypotheses are:

$t_{1} = \varepsilon-t$, and
$t_{2} = t+\varepsilon$, where the $t$ for both these tests is the one reported to the right of $\widehat{\beta}_{pt}$ in the Stata output.

These test statistics are both constructed to be upper tail tests, so:

$p_{1} = P(T>t_{1\nu})$, and
$p_{2} = P(T>t_{2\nu})$.

You reject $H_{0}^{^{–}}$ only if both $p_{1}\le \alpha$, and $p_{2} \le \alpha$, and if you did, would conclude that your found evidence that $\beta_{pt}$ is equivalent to $0$ within $\pm \varepsilon$ at the $\alpha$ level of significance.

You can conduct this test for equivalence using tostti in Stata: tostti #obs #mean #sd 0, eqvtype(epsilon) eqvlevel(#), where:

#obs is $n-$no. of variables in your regression model (I think I count 13 in your case?)... basically it's the degrees of freedom+1.
#mean is $\beta_{pt}$
Updated: (I forgot that tostti expects the SD, not the SE) #sd is $\widehat{\sigma}_{\beta_{pt}}\times \sqrt{n}$
The # in the eqvlevel option is your value of $\varepsilon$ (I am assuming you want a symmetrical equivalence region, if not, check out the help file's uppereqvlevel() option). See my remarks on specific values of $\varepsilon$ below.

#Relevance testing The hella cool application of tests for equivalence is to base inference off both a test for equivalence and a test for difference (this is termed "relevance testing"). Four results obtain:

Reject $H^{^{+}}_{0}$ & Not reject $H^{^{–}}_{0}$: conclude relevant difference
Not reject $H^{^{+}}_{0}$ & Reject $H^{^{–}}_{0}$: conclude equivalence
Reject $H^{^{+}}_{0}$ & Reject $H^{^{–}}_{0}$: conclude trivial difference (i.e. you found evidence of a difference that you have said a priori is too small to care about)
Not reject $H^{^{+}}_{0}$ & Not reject $H^{^{–}}_{0}$: conclude indeterminate (i.e. you have under-powered data for your test, and can say nothing about difference or equivalence)

You obtain relevance tests in tostti by including the relevance option.

#Specific values of $\Delta$ & $\varepsilon$ The value of either $\Delta$ or $\varepsilon$ is a researcher choice. As you point out, there's no a priori literature on what size a "relevant effect" of $\beta_{pt}$ is: if you define equivalence in terms of $\Delta$, then you are using units of $SEND/pt$. Defining equivalence/relevance in terms of $t$ is (a) perhaps a little easier to do in this situation, and (b) is a little more abstract. Some points about selecting a value of $\varepsilon$:

It is impossible to reject any $H_{0}^{^{–}}$ if $\varepsilon\le t_{\alpha\nu}$, so $\varepsilon$ should be thought of as $t_{\alpha\nu}+something$.
I like to think of something as "how much greater the magnitude of $t$ would have to be in order to be relevant".
Half a standard deviation seems a fairly liberal definition of the equivalence/relevance threshold, or $\varepsilon = t_{\alpha\nu} + .5\sqrt{\nu/(\nu-2)}$ (The standard deviation of $t=\sqrt{\nu/(\nu-2)}$, so I have added half of that to $t_{\alpha\nu}$). You can obtain $t_{\alpha\nu}$ in Stata with: di invttail(df,alpha), where df is your degrees of freedom (no. observations - no. of parameter estimates, including the _cons).

A strict definition of the equivalence/relevance threshold might use $.25\sqrt{\nu/(\nu-2)}$, and a very strict $.125\sqrt{\nu/(\nu-2)}$.

You can of course conduct equivalence tests (and relevance tests!) for any of the parameters estimated in your model. You might find useful my answer to Peter Flom's question to get an idea about presenting many equivalence tests in a regression context.

Note: linear regression makes no assumption of normality of the dependent or independent variables; rather, linear regression assumes the residuals are normally distributed (and normally distributed residuals do not require any particular distribution of dependent or independent variables).

Equivalence testing and difference-in-difference measures

1 Answers1