Covariate and Interaction Not Significant When Both in the Model

Question

I am looking at a survival model based on dosing variables, all of which are continuous. I have noticed when categorized, based on Kaplan Meier Curves, that low, medium, high doses behave differently, but the survival probabilities from least to greatest are high dose, low dose, medium dose. Thus, I created a variable DOSECAT with 3 levels (low, medium, high), and I expected an interaction with DOSECAT to be significant with DOSE. When DOSE, DOSECAT, and their interaction are all in a model together, DOSE and the interaction are both nonsignificant. However, if I remove one of them, the other is significant. What model do I use and/or how should this be interpreted?

proc phreg data=data;
CLASS DOSECAT/ PARAM=GLM;
model SURVIVALTIME*EVENT(1)= DOSE DOSECAT DOSE*DOSECAT; 
    HAZARDRATIO DOSECAT / DIFF=distinct;
    HAZARDRATIO DOSE / AT (DOSECAT=ALL);
run;

This occurs all the time in many regression models. See https://stats.stackexchange.com/search?q=significant+regression+not. — whuber, Oct 08 '21 at 15:00
Are your doses on some continuous scale, or are there just 3 doses? It's not clear to me just what you were trying to accomplish with the `dose:dosecat` interaction term. There might be a better way to accomplish your goal, for example flexible modeling of`dose` levels to deal with the apparently non-monotonic association with outcome. Please edit your question to provide those details about your study, as comments are easy to overlook and can get deleted. — EdM, Oct 08 '21 at 15:06
RE: EdM, I edited the question, but here is the relevant change: "I am looking at a survival model based on dosing variables, all of which are continuous. I have noticed when categorized, based on Kaplan Meier Curves, that low, medium, high doses behave differently, but the survival probabilities from least to greatest are high dose, low dose, medium dose. " — jrheintz91, Oct 08 '21 at 15:10
RE: whuber, it's not the change in significance that gets me. It is specifically the fact that it is an interaction term that doesn't really make sense if the main effect isn't in the model. — jrheintz91, Oct 08 '21 at 15:11
We have plenty of posts about that, too: search the collection of keywords that appear in your comment. — whuber, Oct 08 '21 at 20:00

score 1 · Accepted Answer · answered Oct 08 '21 at 17:22

To address your scientific analysis question:

Although binning continuous data at an early stage of data exploration can make sense, binning generally shouldn't be used for statistical analysis. There are well-established ways to model a continuous predictor flexibly without binning in regression, including Cox regressions. My favorite is restricted cubic splines as implemented by rcs() in the R rms package, but a vignette shows how to use tools in the standard survival package to do that: with an example of a non-monotonic association with survival as you seem to have. I'm pretty sure SAS also provides that functionality, although I don't use it myself.

You can thus show continuous plots of association of dose with outcome. Depending on your audience you might want to display some categorical breakdown, but don't do that for analysis. As the categorical breakdown for Kaplan-Meier plots won't accommodate other predictors you might be modeling, I prefer to show example model predictions based on realistic combinations of predictors, with confidence limits.

To address your question about interactions:

As @whuber commented, loss of "statistical significance" when you add an interaction term is common. With small data sets, this could be something so simple as losing a degree of freedom to fitting the interaction term.

In your analysis of the full interaction model (leftmost table), there's probably an additional issue from the collinearity you have introduced. First, you introduced some collinearity by including both the dose and the dose-category even in the model without interaction (rightmost): dose-category is specified exactly by dose, so there's an inherent interdependence of those predictors. Then you compound that problem by multiplying the dose-determined dose-category by dose itself in the interaction term in the leftmost model.*

I suspect that the process has led to substantial collinearity in the model predictors, leading to high negative correlations among the coefficient estimates. Examine the coefficient covariance matrix. In that case neither dose nor the interaction might appear significant, but a combined Wald test on dose that includes both terms and thus takes the coefficient covariances into account might show that dose is "significant" overall.

Finally, your middle model has an interaction term without a term for dose itself. That's seldom a good idea. I'd recommend reading the extensive discussion on that page carefully, to understand why and how that might show up in apparent "significance." My guess is that your omitting a term for dose in that model minimized the collinearity and thus the high negative correlations among the coefficient estimates that otherwise made dose appear to lose "significance."

*I'm still not clear just what you were trying to model with this particular interaction term, but the principles I describe are generally applicable.

Thank you, this is very helpful. I will look at the restricted cubic splines, and it's no problem going from SAS to R. — jrheintz91, Oct 08 '21 at 17:34
Here is my rationale for the interaction. The survival time seems to depend on if the dose is high, medium, or low, so I am trying to model that dependence. I also wasn't suggesting that the model including the interaction but not the fixed effect was a good model. I was simply showing there is evidence to suggest the interaction is significant. — jrheintz91, Oct 08 '21 at 17:36
@jrheintz91 the interaction would model a different slope of the outcome-`dose` relationship depending on whether the `dose` was in the low, medium, or high group. I guess that could be a start toward modeling the shape of the relationship, but at best it would depend a lot on where you drew those cutoff boundaries. Look at the coefficient covariance matrices; I'm pretty sure that the reason for the change in "significance" is within them. If so, maybe edit your question to include them. See if a Wald test combining`dose` and its interaction with `dose-category` is "significant." — EdM, Oct 08 '21 at 18:07
Really appreciate all of your help. All tests of significance of the entire model are highly significant, which makes sense especially since DOSECAT is significant. You are right that multicollinearity is a significant issue; I posted the covariance and correlation matrices. And the cutoffs for the categories were arbitrarily made at Q1 and Q3, but yes that change in slope is exactly what I am trying to model. I am now looking for ways to do so without the categorization. — jrheintz91, Oct 08 '21 at 18:23
@jrheintz91 continuous modeling of `dose` will be your best bet in general. Be careful that the `dose` levels aren't serving as a proxy for something else associated with survival if this isn't a randomized trial. For example, if someone came in sicker, might a higher `dose` have been prescribed? — EdM, Oct 09 '21 at 14:14

Covariate and Interaction Not Significant When Both in the Model

1 Answers1