Is the order of writing order of the variable to be corrected important to the regression model?

Question

fit<-coxph(Surv(t1,t2,status)~hemoglobin+sex+age+cluster(id),data)
fit

In my analysis, I want to adjust hemoglobin for gender and age. So, in the model I wrote, should my first variable be hemoglobin?

my secenod question:

Is it important to write a cluster (id) for time-dependent analysis? Is it mandatory?

It would take you less time to find the answer yourself (change the order of variables and rerun the code) than it took to post the question! — whuber, Jan 17 '21 at 16:58

score 0 · Answer 1 · answered Jan 18 '21 at 20:16

As the comment from @whuber notes, you can readily test whether the order of variable entry matters in terms of the model itself. (It won't.) There might, however, be an issue if you try to use some forms of anova() analysis on the model. I'd recommend using an anova() function that doesn't depend on the order of variable entry, like the that in the R rms package.

In terms of cluster() terms for time-dependent covariates, it depends on whether you have multiple events per individual. If there's only 1 event per individual, then it's not necessary. As the R time-dependent survival vignette puts it:

One common question with this data setup is whether we need to worry about correlated data, since a given subject has multiple observations. The answer is no, we do not. The reason is that this representation is simply a programming trick. The likelihood equations at any time point use only one copy of any subject, the program picks out the correct row of data at each time.

The practically important exception (if data are coded correctly) is:

When subjects have multiple events, then the rows for the events are correlated within subject and a cluster variance is needed.

Is the order of writing order of the variable to be corrected important to the regression model?

1 Answers1