I am currently performing an analysis in which we are hoping to develop a risk score for a survival outcome using machine learning techniques. Currently, our process is as follows:
Split randomly into training and test data by ID
Use imputation to replace missing values in the training data. Save this imputation scheme, and reuse to replace missing values in the testing data.
Fit models on the training data
Create predictions on the testing data and compute Harrell's c-index
Now, it seems that easiest the way to get a 95% CI for the c-index is to use bootstrapping. However, because of the various steps of our analysis, I am not certain when in the process to perform the resampling procedure. My thoughts are that this could either happen at the very beginning, before data is even split into training and testing, or it could happen directly after the imputation step. Is there a general rule which makes it clear when to perform the resampling in this case?
EDIT: A third option I've seen is resampling ONLY in the test data to avoid the computational expense of having to refit 1000 models. This does seem like the only feasible option for my computational capacity, but I'm sure there are disadvantages to this approach...
EDIT 2: There are fewer than 1000 events in the dataset and around 4000 individuals, so censoring is quite high. I've used glmnet to fit the models, and to my understanding, these use Harrell's c-index as the performance metric when fitting the models. So, I'm not sure if there would be a problem in using another metric to define model performance when the package seems to use the c-index. Also not sure how best to compare something like the elastic net and a survival random forest in this context with something likelihood-based, which was part of the appeal of the c-index.