7

Given the following model as an example:

$$Y=\beta_0+\beta_A\cdot A+\beta_B\cdot B+\beta_{AB}\cdot A \cdot B+\epsilon$$

In alternative notation:

$$Y\sim A + B + A: B$$

The main question:

When permuting entries of variable $A$ to test its coefficient ($\beta_A$) in a model, should an interaction term that includes it such as $B\cdot A$ be recomputed as well?

Secondary question:

And what about testing the $B\cdot A$ interaction term coefficient ($\beta_{AB}$)? Are its permutations computed regardless of the variables $A$ and $B$?

A bit of context:

I want to perform a test on the coefficients of a model (it's a canonical correlation analysis, but the question is applicable to any linear model including interactions).

I'm trying my hands with permutation tests. While it's fairly straightforward to test the canonical correlation itself, how to do the same with the variable scores, or coefficients, is a bit unclear to me when including an interaction term.

I've read How to test an interaction effect with a non-parametric test (e.g. a permutation test)?, but my question is much more practical.

amoeba
  • 93,463
  • 28
  • 275
  • 317
Firebug
  • 15,262
  • 5
  • 60
  • 127
  • I'm not sure I follow this; what statistic would be exchangeable under the null here? – Glen_b Jul 20 '17 at 20:03
  • @Glen_b Sorry for being unclear, I'm learning the ropes on this one. Tentatively: I'm testing the model coefficients magnitude, so under the null that would be exchangeable I suppose? (Or am I not getting what exchangeability means here?) – Firebug Jul 20 '17 at 20:06
  • Sorry, I'm up many hours past my bedtime, I didn't express that correctly (I misused "statistic"; I should have referred to something related to the individual observations). When you do a permutation test, you need to permute something. What is being permuted? – Glen_b Jul 20 '17 at 20:10
  • 1
    @Glen_b No problems! I'm permuting the instances of said variable. So, the null hypothesis is that the effect is zero, permuting the entries wouldn't significantly change the magnitude of the effect. It's still a bit early here haha – Firebug Jul 20 '17 at 20:17
  • Then I compare the magnitude of the coefficient without permutations to the magnitude of the resampled coefficients, obtaining a p-value from the proportion that attained higher magnitude than the first, non-permuted, one. – Firebug Jul 20 '17 at 20:30
  • can you clarify what you mean by "permuting the entries"? Imagine you were to permute something between the $i$th and $j$th observations. Are you swapping the $y_i$ and $y_j$ values but keeping all the elements of $x_i$ together? Are you interchanging some elements of the x-variables as well? Are you interchanging *residuals*? – Glen_b Jul 21 '17 at 01:29
  • @Glen_b To test the coefficient associated with the IV $x$ I would swap $x_i$ and $x_j$, the rest would remain the same. – Firebug Jul 21 '17 at 11:42
  • I'm not sure that works quite as you might hope. What's the null and alternative? – Glen_b Jul 22 '17 at 04:30
  • @Glen_b $H_0: \beta_{A \times B} = 0$, $H_1: \beta_{A \times B} \neq 0$. The procedure I'm describing is closely related to _Kończak, G. (2012). **On Testing the Significance of the Coefficients in the Multiple Regression Analysis**. Acta Universitatis Lodziensis. Folia Oeconomica._ – Firebug Jul 22 '17 at 20:27
  • 2
    I wouldn't pay much attention to that paper, which as far as I can see doesn't even provide an adequate description of the permutation method employed. Here's a better one: http://avesbiodiv.mncn.csic.es/estadistica/permut2.pdf – Jacob Socolar Jul 23 '17 at 00:14
  • @user43849 As I said (at the beginning of the lengthy conversation haha) I'm learning the ropes this time. Thanks for the reference! – Firebug Jul 23 '17 at 00:38
  • One quick clarification: are you mainly interested in testing the significance of the interaction term, or the sifnificance of the main effect? If the latter, before you go too far down this permutation rabbit-hole, you might ask whether you really care about the significance of the main effect in the presence of the interaction. I realize that might not be helpful advice at this point, so take it FWIW. – Jacob Socolar Jul 23 '17 at 00:47
  • @user43849 I'm mainly interested on the significance of a single main effect. The other effects are mostly known nuisance to my application. – Firebug Jul 23 '17 at 00:51
  • Why did you give a null and alternative about the interaction if you're mainly interested in the main effect? – Glen_b Jul 23 '17 at 15:54
  • @Glen_b I want to test it as well, it's also part of the question. – Firebug Jul 23 '17 at 15:57

1 Answers1

3

As I'm just starting with permutation tests, I though a question was a good idea. Indeed, thanks to comments by @Glen_b and @user43849, I perceived many misunderstandings and inconsistencies of the theory from my part. For one, I was thinking about testing the magnitude of the coefficient instead of the effect, which is what actual interest.

So, as I'm learning, an actual answer to be criticized sounded just as good.


To answer this question and appoint a permutation strategy that complies with my requirements, I resorted to Anderson MJ, Legendre P. "An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model." Journal of statistical computation and simulation 62.3 (1999): 271-303.

There, the authors do empirical comparisons between four permutational strategies, in addition to normal theory $t$-statistic tests:

  1. Permutation of Raw Data (Manly, 1991, 1997)
  2. Permutation of Residuals under Reduced Model (Freedman & Lane, 1983)
  3. Permutation of Residuals under Reduced Model (Kennedy, 1995)
  4. Permutation of Residuals under Full Model (ter Braak, 1990, 1992)

Here I'll quote the description given to the strategy put forward by Manly. Given a model $Y=\mu+\beta_{1\cdot2}X+\beta_{2\cdot1}Z+\epsilon$:

  1. The Variable Y is regressed on X and Z together (using least squares) to obtain an estimate $b_{2\cdot 1}$ of $\beta_{2\cdot 1}$ and a value of the usual $t$-statistic, $t_\text{ref}$ for testing $\beta_{2\cdot 1}=0$ for the real data. We hereafter refer to this as the reference value of $t$
  2. The Y values are permuted randomly to obtain permuted values Y*.
  3. The Y* values are regressed on X and Z (unpermuted) together to obtain an estimate $b_{2\cdot 1}^*$ of $\beta_{2\cdot 1}$ and a value of $t^*$ for the permuted data.
  4. Steps 2-3 are repeated a large number of times, yielding a distribution of values of $t^*$ under permutation.
  5. The absolute value of the reference value $t_\text{ref}$ is placed in the distribution of absolute values of $t^*$ obtained under permutation (for a two-tailed $t$-test). The probability is calculated as the proportion of values in this distribution greater than or equal, in absolute value, to the absolute value of $t_\text{ref}$ (Hope, 1968)

So this strategy conserves the covariance of the independent variables X and Z. Other methods focus on the testing of partial coefficients in isolation, and these are discussed in the text. Also, possible drawbacks of the strategy of permutation of raw data are given both in the text and in the literature.

Firebug
  • 15,262
  • 5
  • 60
  • 127