If you used the predict
function on an object returned by model.avg
, I don't see a reason for the confidence intervals to be technically untrustworthy. (There can be substantial issues arising from how the models were chosen for averaging in the first place, but I assume that you are aware of those.)
Provided that the proportional hazards assumption holds true, and that there are no substantial interactions among your predictors with respect to survival, it shouldn't matter that you have multiple levels for your categorical variables. If the assumptions are met then the residual error terms used to calculate confidence intervals should be the same regardless of your choices of reference values, even as relative hazard changes with the choice of reference. Whether a particular prediction case is "significantly" different from the reference case, based on the confidence intervals calculated from the residual errors, of course depends on the reference used.
You might want to consult the very useful page Prediction in Cox regression, listed as Related on this page, for other approaches to the model selection/prediction problem, issues of relative and absolute hazards, etc.