How to decide the best form of BMI used in cox regression, categorical or continuous?

Question

BMI is always analysed in the form of a categorical variable in medical research. In my Cox regression model, I kept BMI in its original form, i.e., a continuous variable. But the reviewer asked how I can be sure that continuous BMI fits better than categorical BMI. Therefore, I need to test the linearity of BMI. I divided BMI into four sections, but the HRs of BMI in four categories were not significant anymore, while the continuous BMI was statistically significant. Am I in the wrong direction? Are there any better methods to test the linearity of BMI?

If you divide a predictor into subsets, it is necessarily less variable. If a predictor is almost constant, to that extent it can't explain variability in the response. — Nick Cox, Jun 30 '21 at 14:03

score 10 · Accepted Answer · answered Jun 30 '21 at 13:29

10

BMI might be associated continuously with outcome but not necessarily linearly. The best way to test that is to fit BMI as a continuous predictor flexibly, for example with restricted cubic splines as in the rms package in R. If you use the tools in that package, then you can use its anova() function to test the significance of the continuous fit overall and of the non-linear terms in particular.

There is almost never anything to be gained by categorizing a continuous variable. If someone insists that you do it anyway, compare the Akaike Information Criteria (AIC) of the models fit continuously and with categorization. I suspect that the fit will be better with a flexibly fit continuous variable.

One question to consider is whether BMI, itself a derived variable, is useful. It's quite possible that fitting both its components, height and weight, would work better.

answered Jun 30 '21 at 13:29

EdM

57,766
7
66
187

6

I'd say there is even an ethical dimension to this question, to use the information available and not degrade it. We don't expect jumps in behaviour at specific levels of BMI. – Nick Cox Jun 30 '21 at 13:58
On BMI as a derived variable, you would also need to include the interaction of height and weight. – John Jun 30 '21 at 14:06
I couldn't agree with you anymore! In fact, my paper, which is currently under review, is to provide more evidence that weight is a better predictor than BMI for breast cancer risk. But in that paper, we also pointed out that weight or BMI should be fitted as a continuous variable rather than categorical (we didn't prove that point in that paper, we just fit bmi as a continuous variable and it was statistically significant...so I think it is reasonable that the reviewer asked us to prove that point). – Zhoufeng Jun 30 '21 at 20:08
For the model fit comparison, would that be enough to prove the linearity or continuousness? I fitted BMI by category, but their coefficients are not statistically significant anymore. Would this be the evidence that rejects that there is a linear or continuous relationship between BMI and breast cancer risk? – Zhoufeng Jun 30 '21 at 20:33
3

+1 I like the idea of using something like a spline when the variable is treated as continuous. A common complaint about suggestions not to categorize is that the response might increase, then decrease, the increase as the predictor increases, and the categorized variable will handle that. The spline handles that issue without losing information by categorizing. – Dave Jun 30 '21 at 20:39
1

BMI is a horrible model anyway. There is no physical reason to divide mass by the square of the height. From a simple math-modeling point of view it should be mass/height^3, I read a paper some years ago that argued the exponent should be 2.8 or something. In any case, BMI is biased against people whose heights are not near 172 cm. – Ron Jensen Jun 30 '21 at 22:41
I've fitted two models with categorical and continuous BMI separately, but ended up with the difference between AIC or BIC less than 2. So there is not enough evidence supporting that the model with continuous BMI fitted better than the model with categorical BMI----which is what I never want to see... Is there any other method that directly tests the hypothesis that continuous BMI better fits than categorical BMI? – Zhoufeng Jul 01 '21 at 05:27
@Zhoufeng it's hard to say without seeing the actual results. Comparing measures of model performance (validation and calibration e.g. with the `validate` and `calibrate` functions in the R `rms` package) might shed some light. If BMI is a relatively minor contributor to risk and the data set isn't large, you just might not be able to demonstrate a difference in the modeling approaches. Then you can't say that you "proved" that the continuous modeling is better, but you can say things like "categorizing BMI did not improve model performance." The burden of proof should be on the categorizers. – EdM Jul 01 '21 at 12:01
@EdM Thanks so much for your reply! – Zhoufeng Jul 02 '21 at 01:41

How to decide the best form of BMI used in cox regression, categorical or continuous?

1 Answers1