I am working with the Cox Proportional Hazards model. Where the covariates include 2 categorical variables. Assume each category has 3 levels, so I model these in terms of dummy variables.
Category A, level 1, 2, 3 correspond to dummy variables $A_1$,$A_2$ and $A_3$, respectively.
Similarly, category B have dummy variables $B_1$,$B_2$ and $B_3$.
To avoid linearly dependent covariates, the model is represented as follows: $\text{Intercept} + A_2 + A_3 + B_2 + B_3$.
However, the intercept is "included" in the baseline hazard, so the final model for the cox regression is just $A_2 + A_3 + B_2 + B_3$.
Now, if I wanted to predict the survival curve of a subject within $A_1$, $B_1$, this would correspond to the baseline survival curve?
Working in R, everything looks fine and most predicted survival curves have good fit with data (estimated by KM), except for the "$A_1$,$B_1$"-cohorts (and also "$A_2$" it seems). And it is just not that the fit is poor, its clearly suboptimal. The curve is shifted upwards or downwards, thus increasing or decreasing the corresponding coefficients would clearly result in a better fit.