My model looks like: ROI_size ~ diagnosis + medication_dose + sex + age. Specifically, I want to find the effect of disease (1 or 0), adjusted for current medication dose (measured in mg) on brain region size, while controlling for age and sex. Both of the latter variables are available for the entire dataset. However, the former (medication dose) is defined only for the patients (diagnosis = 1).
I have thought of two solutions. One would be to assign medication_dose = 0 to all healthy controls. However, this creates 1. collinearity with diagnosis 2. zero-inflated data and is per se not correct (I think), since the difference between 0 and 1 mg is not the same as the difference between 1 mg and 2 mg.
The other solution would be to adjust the ROI_size for medication_dose within the patient cohort, i.e., to run the model ROI_size ~ medication_dose and then by calculating its beta and then adding beta*medication_dose to the ROI_size for all patients.
My question in this case is if I should have age and sex in the second model, since including them will yield a more accurate beta for medication_dose. In that case, do I need to correct for age and sex separately for the healthy controls as well and then to find the effect of disease to just do ROI_size ~ Diagnosis?