A problem with your overall plan is that the error estimates in your later Cox models won't take into account the fact that you used this particular data sample with random forest to select the predictors used in the Cox models. There's also a danger the process might provide a model that works well on your particular data set but doesn't work well on new samples from the same underlying population.
Before you go any farther, look at Harrell's course notes on regression modeling strategies and other resources linked from the associated web site. Pay particular attention to Chapter 4 of the course notes, on Multivariable Modeling Strategies. Instead of blindly jumping into automated predictor selection, it's usually best to apply your knowledge of the subject matter and of your data to develop (without looking at the outcomes) a set of candidate predictors of a size that's appropriate to the scale of your data. For example, in a Cox model, you generally should limit yourself to about 1 candidate predictor per 15 events in your data set.
Also, unless you have thousands of cases, you shouldn't be splitting the data into separate train/test sets, as you lose precision in the model and power in testing (as you imply to be your approach in a comment). Evaluating the model-building process via bootstrapping is a much more efficient use of your data. See Chapter 5 of Harrell's course notes.
Then, if you like trees, why not use a gradient-boosted model directly? The gbm
package in R can handle Cox models, estimate the baseline hazard (smoothed, if you wish), and return predictions of log-hazards for new cases. Those are what is used for getting predictions from standard Cox models. (Starting with a gbm
Cox model might require some extra calculations on your part, though.)
If you have thousands of predictors and need to do large-scale predictor selection, use a principled method like LASSO. The R glmnet
package can handle Cox models and provide predicted survival curves, based on the model, for specified covariate values.
To answer your original question: survival models are fit by methods that maximize partial likelihood. So measures based on partial likelihood deviance are generally best. That's what's used for the gradient evaluation in gbm
and is the default optimization for cross-validation in glmnet
. The latter package provides concordance (the fraction of pairs of observations in which the predicted and actual event order is correct) as an alternative, but Harrell (who introduced the concordance C-index to survival analysis) recommends against concordance for model comparison. See this page, for example. Once you have developed a model, there are many available measures of discrimination and calibration for evaluating it; see Chapters 20 and 21 of Harrell's course notes for those aspects of Cox models. If your models can't properly be compared on deviance you could evaluate their performance based on those measures, for example by building the competing models on multiple bootstrap samples of the data and evaluating those measures on their application to the full data set.