I have a longitudinal dataset comprising of physicians and their time independent covariates (age group, physician type, etc) and time dependent covariates (number of patients, hours worked, etc). I have several entries for each physician where the time-dependent covariates change from month to month (I have monthly intervals). This data is formatted in count process format.
I want to predict the risk/survival of a physician leaving. I am unsure how to do this because:
COXPH can handle the time-dependent covariates and does model the effect of the variables on the hazard but can't predict the survival out of the observation window.
Classical machine learning can predict/regress a physician leaving but may not model the time aspect very well (treats each sample as an independent sample, no relation to group of samples for a physician)
I am wondering if these approaches are correct?
Would it be possible to extract the hazard from the coxph (or predict partial hazard) and keep this hazard constant until the survival probability reaches 0?
Could I train a classical model such as xgboost or random forest to predict if a physician is going to leave before x months and include rolling averages on the time-dependent covariates to relate the different observation within a physicians group of observations? EDIT: The input to this classical ML model would be my all my features (time-independent + time-dependent) and the output is physician did not leave/left ([0,1]). This Binary output is generated from doing term_date-study_start_date < num_days. This num of days could be 6 months. Meaning if at some point a person leaves we categorize this persons last 6 months as 1 (left) and this is what we are trying to predict. I am proposing to compute rolling averages for each of the time varying covariates and adding these averages as a new feature in the dataset.
Another approach is to do the same as 2 but instead regress the term date. I can convert the term_date into days since beginning of study and regress this. I can then use this model to predict when they will leave and threshold on this. I think this would be a great feat but it may be difficult to do.
Essentially I would like to be able to predict either when a physician will leave (survival curve is ok here or a regressed value) or if they leave before a certain time interval for physicians that include time varying covariates. The above are some approaches I have thought about but I am not 100% sure if they are the correct way. How can I accomplish this?