Is it valid to use ROC calculated during test/validation to interpret results of final production model?

Question

I've trained a binary classification model which outputs a "probability" between (0,1).

During testing and validation, I use the ROC to measure the performance of the model. Also, I use the ROC to determine the threshold on which to cutoff false vs true predictions (e.g. I set the target to under 15% FPR).

When creating a model for production, I thought it would be ideal to train on all the available data set (e.g. no test nor validation split). Now, without test or validation split I don't have a ROC for the final model so I am without a threshold to interpret the model's output.

Is it valid to use the ROC obtained during testing? Should I calculate a new ROC of the final model over instances observed during training?

Is there something fundamentally wrong in my approach?

score 0 · Accepted Answer · answered Apr 12 '19 at 17:25

0

Yes, you can train on the whole data and report the test (or cross validation) ROC as the generalization error, but this ROC will probably be optimistic.

It would be better to estimate the generalization error on the model trained on the full data by doing cross validation. See this answer.

answered Apr 12 '19 at 17:25

arinarmo

356
2
5

Thanks a lot @arinarmo for the answer. Very interesting. – BaldML Apr 12 '19 at 17:39
If you feel like it helped, you should accept the answer. It's how we say thanks ;) – arinarmo Apr 12 '19 at 20:57

Is it valid to use ROC calculated during test/validation to interpret results of final production model?

1 Answers1