5

I have difficulties interpreting an interaction term in a linear regression with regards to my hypothesis.

Consider this basic example:

H0: Better school grades lead to higher income and this relation is the same for men and for women.

DV: income

IV: school grade, sex

How do I interpret a non-significant, positive interaction term grade * sex? Would it be correct to accept H0?

Andy
  • 18,070
  • 20
  • 77
  • 100
SPi
  • 553
  • 1
  • 6
  • 18

1 Answers1

6

"Accepting H$_{0}$" is always a logical fallacy (i.e. lack of significance is always "failed to reject"). Interpretively, this means you did not find evidence of the interaction grade*sex.

The reason why you can only state that you did not find evidence of X with tests for difference is that these tests only provide evidence of how likely you are to see $\hat{\beta}_{\text{grade}\times\text{sex}}$ if H$_{0}$ is true, and your test only yields your desired power to reject for at least as large as one not all (smaller) possible values under H$_{\text{A}}$.

If you want to state that you found or did not find evidence of an absence of X, then you need to use, for example, tests for equivalence (say, using two one-sided tests) where H$_{0}$ no longer takes the form H$_{0}^{+}\text{: }\theta=0$, but rather takes the form H$_{0}^{-}\text{: }|\theta|=\Delta$, where $\Delta$ is a researcher-specified value meaning "too small a difference to care about". (the '$+$' and '$-$' in the superscript indicate null hypotheses for difference and for equivalence, respectively.)

To perform an equivalence test on grade*sex (i.e. you want to provide evidence that there is no interaction), you will need a few things:

  • $\theta$: the effect you are estimating for grade*sex (i.e. the coefficient $\hat{\beta}_{\text{grade}\times\text{sex}}$)
  • $\Delta$: an effect size that is too small to care about (e.g. we do not care about $-0.1 \leq \beta_{\text{grade}\times\text{sex}} \leq 0.1$. A $\Delta=0.1$ is not magical, and I only use it here as a imaginary value of $\Delta$, you need to decide).
  • $s_{\theta}$ the standard error of your estimate (i.e. the standard error of $\hat{\beta}_{\text{grade}\times\text{sex}}$)

Given that, then:

H$_{0}^{-}\text{: }|\beta_{\text{grade}\times\text{sex}}| \ge \Delta$, which gives two one-sided null hypotheses:

H$_{01}\text{: }\beta_{\text{grade}\times\text{sex}} \ge \Delta$, and
H$_{02}\text{: }\beta_{\text{grade}\times\text{sex}} \le -\Delta$

The test statistics corresponding to both of these are:

$$t_{1} = \frac{\Delta - \hat{\beta}_{\text{grade}\times\text{sex}}}{s_{\hat{\beta}_{\text{grade}\times\text{sex}}}}$$ $$t_{2} = \frac{\hat{\beta}_{\text{grade}\times\text{sex}}+ \Delta}{s_{\hat{\beta}_{\text{grade}\times\text{sex}}}}$$

These are both right-side/upper tail tests, so you get the p-values:

$p_{1}=\text{P}\left(T_{df} \ge t_{1}\right)$, and
$p_{2}=\text{P}\left(T_{df} \ge t_{2}\right)$

If both H$_{01}^{-}$ and H$_{02}^{-}$ are rejected with $p\le\alpha$ (not $p \le \alpha/2$), then, taken together with the failure to reject H$_{0}^{+}$ you can conclude you found evidence that the grade*sex is equivalent to zero, given $\alpha$ and $\Delta$.

However, if you reject only one or reject neither of H$_{01}^{-}$ and H$_{02}^{-}$, then, taken together with the failure to reject H$_{0}^{+}$ you can't conclude anything: your results are indeterminate because your data are underpowered.

Alexis
  • 26,219
  • 5
  • 78
  • 131
  • Excellent answer. The hypothesis in the OP is even more complex as also a positive effect of "grade" is to be demonstrated. – Michael M Sep 06 '14 at 16:40
  • +1 Right... of course, she could always simply pose a single one-sided test... – Alexis Sep 06 '14 at 16:42
  • Just corrected an error in $t_{2}$... was $\Delta - \beta$ in the numerator, when it should have been $\beta + \Delta$. – Alexis Sep 06 '14 at 16:46
  • 1
    A confidence interval for the effect of interest is valid no matter what the $P$-value. – Frank Harrell Sep 06 '14 at 17:16
  • This is good information. I wonder if you might change "and your test is based on only *one* not all possible values under H$_\text{A}$" to something like 'and your test only yields your desired power for *one* not all possible values under H$_\text{A}$'? – gung - Reinstate Monica Sep 06 '14 at 17:40
  • 1
    @FrankHarrell That's cool. But I am missing how that speaks to the OP's question? – Alexis Sep 06 '14 at 19:19
  • Ignore the $P$-value and use the confidence interval for the effect of interest. Since one cannot accept $H_0$ (i.e., large $P$-values do not allow one to draw any conclusion except that evidence was not established for an effect), estimation has many advantages over hypothesis testing. – Frank Harrell Sep 07 '14 at 13:10
  • @FrankHarrell I am aware of this perspective. I think my answer speaks far more directly to the OP's question about interpreting hypothesis tests. – Alexis Sep 07 '14 at 16:24
  • Yes and no; the original motivation for a hypothesis test was not necessarily on solid ground. – Frank Harrell Sep 08 '14 at 01:14
  • @FrankHarrell That's fair enough. :) I *still* like relevance testing, though. – Alexis Sep 08 '14 at 01:17
  • You might explain why, since statistical tests are non-informative if non-significant. – Frank Harrell Sep 08 '14 at 01:48
  • @FrankHarrell, last two paragraphs of my answer and more on the TOST tag info page... It is true that some times even relevance tests will be underpowered and one is left in an indeterminate or non-informative place, but so what? Sometimes data just don't say much because one is not looking closely enough. – Alexis Sep 08 '14 at 03:11
  • You seem to be a big fan of hypothesis tests so no further discussion will be helpful. – Frank Harrell Sep 08 '14 at 04:00
  • @FrankHarrell To be fair: I am also [a fan of CIs](http://stats.stackexchange.com/questions/114246/problems-estimating-cis-on-the-survival-function-s-t-in-a-logit-hazard-mod) and am having the dickens of a time trying to figure out the problem I am encountering in this link. :) – Alexis Sep 08 '14 at 04:29
  • Lots of good stuff in that link. Perhaps the bootstrap will help. Note that it is not adequate to compute overall confidence coverage, as many methods give you correct coverage by being wrong in both tails. – Frank Harrell Sep 08 '14 at 12:22