How to include independent variables with zero variance in a Cox proportional hazard model?

Question

Suppose you want to investigate the effect of several independent variables on an event taking place using the Cox PH model. You have some independent variables that change only with each new time step (inflation and covid_lockdown), and in this case, the event is defaulting on loans.

id   time   inflation   covid_lockdown   salary   debt   event
01      1          2%               no      30k   100k       0
02      1          2%               no      70k    50k       0
03      1          2%               no    2000k     0k       0
01      2          8%              yes       0k   110k       0 
02      2          8%              yes      75k    45k       0
03      2          8%              yes    1500k     0k       0 
01      3          6%              yes      40k   100k       1
02      3          6%              yes      80k    43k       0
03      3          6%              yes    1200k   100k       0

Since inflation and covid_lockdown have zero variance across all individuals per time step, you cannot include them in the CPH model. However, we expect inflation to affect individuals differently. For example, an individual with a low salary is likely to suffer from high inflation, whereas someone with a high salary is unlikely to be affected. How can we include independent variables with zero variance in the model?

Specifically, I want to investigate the effects of inflation and covid_lockdown on defaulting on loans for different groups (eg., low salary vs high salary).

Is this not a duplicate of your previous question? https://stats.stackexchange.com/questions/560227/do-you-have-to-remove-perfectly-collinear-independent-variables-prior-to-cox-reg — AdamO, Jan 26 '22 at 18:58
@AdamO, No, that question asked if independent variables with no variance should be removed prior to cox regression. This question asks how we can include them because they contain some information. I also provide a better example here. — user572780, Jan 26 '22 at 19:01
In my answer to your other question with respect to such a predictor, $X_j$, I say "If $X_j$ is involved in interaction terms with other covariates, however, then estimates of those interaction coefficients could be possible." So including interactions between those "zero-variance" predictors and others should be OK. — EdM, Jan 26 '22 at 19:56
@EdM I thought about what you said, but there is a problem. For example, if I adjust `salary` for inflation, you get the same results. This is because, SA is only interested in ratios of IV between groups, correct? — user572780, Jan 26 '22 at 20:09
An interaction term is a _product_. Provided that individuals differ in `salary` values at any given time, they also will differ in an `inflation:salary` interaction value. As shown in my [other answer](https://stats.stackexchange.com/a/560234/28500), for each predictor (including interaction values) the Cox model at an event time evaluates the _difference_ of the predictor value for the case having the event from the risk-adjusted mean of all those still at risk. That's **not a ratio**. Unless all those at risk have exactly the same value for the interaction too, there's no problem — EdM, Jan 26 '22 at 21:41
@EdM Okay, I have to think about it some more. There is still another problem. This `inflation:salary` interaction term won't affect individuals differently. For example, low earners are likely to suffer from inflation, whereas high earners are not. Would it be correct to fit two Cox regression models, one for each class, and then compare their hazards? The same for `covid_lockdown`: one model before lockdown and one model for lockdown. — user572780, Jan 27 '22 at 09:43
An interaction (product) term like `inflation:salary` specifically allows for differences among individuals. For example, at 0 inflation that interaction would be 0 for all individuals, but there would still be differences in `salary` to evaluate. At 10% inflation both the `salary` and the interaction term would differ among individuals. As you pool data over time, a single model gives separate estimates of the influence of `salary` at 0 inflation and the _extra_ influence of `inflation` at any `salary` level. That's much better than separate models. — EdM, Jan 27 '22 at 16:01

How to include independent variables with zero variance in a Cox proportional hazard model?

0 Answers0