0

Consider investigating an interaction using a linear regression with the specification:

Y = β0 + β1X + β2M + β3X*M + e

Where: $$ \begin{array}{c} & \text{Coefficients} & \text{Sig.} \\ \hline Intercept & -2.00 & \text{p<.05 (significant)} \\ X & -.001 & \text{p>.05 (not significant)} \\ M & .001 & \text{p>.05 (not significant)} \\ X*M & 1.00 & \text{p<.05 (significant)} \end{array} $$

Then say you wanted to graph the two-way interaction maybe using the commonly available interaction spreadsheet from Jeremy Dawson's website (sample spreadsheet).

Concerning the effects of X and M, is it more "correct" to:

a. enter in the value returned by regression even though that regression does not support the existence of any effect

or

b. enter 0.00 for those coefficients because the regression does not support a statistical relationship between X and Y nor between M and Y when accounting for their interaction?

Note that making either decision here would not change the visualization in this case but potentially could (I think) with higher coefficients for X and M. However, I would consider this question to be more of a "philosophy of science" question than purely being about the visualization itself (if there were a "philosophy-of-science" tag I would have used it here). If a statistical test of a relationship between two variables reveals that there is no support for a statistically significant relationship, should we ever use numbers that indicate that there is indeed a relationship between the two variables? However, it also seems strange to throw out those coefficients even though the statistical test is unable to support the existence of any relationship.

If anyone can point me to good papers concerning this topic I would be extremely interested. Thanks in advance!

  • The regression is not saying the effect is zero; it is saying that it cannot pin down the effect. A high p-value is not evidence in favor of the null hypothesis. Nonetheless, the observed points have the calculated coefficient for their equation of best fit. – Dave May 25 '21 at 18:10
  • 1
    I believe the discussion at https://stats.stackexchange.com/questions/11009 answers your questions. The point is that in most cases the terms `X`, `M`, and `X*M` form a *group* in which the inclusion of `X*M` requires the inclusion of both `X` and `M`. You haven't given us any indication your situation might be an exception. – whuber May 25 '21 at 18:35
  • @Dave, thank you for your reply. Good point. A high p-value does not support there being no relationship - a high p-value doesn't support there being a relationship. Also, as I never hypothesized any relationship to begin with, it's questionable to even say that specific p-value is a proper test of anything. I'm noticing that my example is not perfect, by far. – Harrison B. Pugh May 25 '21 at 19:17
  • @whuber, thank you too. However, that discussion seems to talk about the inclusion of regressors when modeling their interaction and so it's not really on target. What I'm asking is that if we can't support their being a relationship between two variables (high p-value) should we use or discard the information about that unsupported (and possibly not hypothesized) relationship. – Harrison B. Pugh May 25 '21 at 19:18
  • For the question about what to do about p-values, please consider consulting some of our [higher-voted threads on the topic](https://stats.stackexchange.com/search?tab=votes&q=p-value%20interpret). – whuber May 25 '21 at 19:45

0 Answers0