I have the following model:
Reduction_in_clinical_score ~ Baseline_clinical_score +
Site_of_data_collection + Treatment_Type + Age + Sex + ERP
Site of data collection is made up of four levels, treatment type has two levels, and sex has two levels. All other variables are continuous.
I have 88 observations in total.
In Matlab (using fitlm), I am running into the following error: Warning: Regression design matrix is rank deficient to within machine precision.
From what I have gathered online, it seems as though this may be caused by having an inadequate number of observations relative to the number of predictors in my model.
My question is then what would be the next step in this case?
Would it be to remove a predictor (ideally based on theory/literature)?
I ran the same linear regression in SPSS, which provided no warning (the output all looks reasonable).
If I may note, I checked the rank of my predictor variables, and it returned as full rank (i.e. 6). I've also checked the VIF values in SPSS and the highest is value is ~4.6. However, SPSS also shows Site and Treatment_Group as highly correlated (r = -0.861, - < 0.001). Could this be an issue of multicollinearity between two categorical variables? When I remove one or the other, the issue goes away.
I should also note that there may be a design issue. I think the problem may be stemming from the fact that for Treatment 1, it was collected at sites A, B, C and D. However, for treatment 2, data was only collected at site A.