I'm new to Machine Learning and Statistics so pardon me if I say anything ridiculous.
By "test set" I mean the set that we evaluate the final hypothesis and then report the final result (e.g. test error) that is an unbiased estimate of the corresponding out-of-sample result (e.g. out-of-sample error).
By "validation set" I mean the set that we use to do model selection or parameter tuning to choose out the final hypothesis. The best result found on the validation set is biased by definition (if you evaluate only one hypothesis on the validation set, then the validation set is a test set).
I am sorry for the two above lengthy paragraphs as I want to be sure that we are talking about the same thing. Now come the main question:
Why do we want to calculate ROC curve on test set?
In many other resources that I read, they calculated ROC curve on either training set or test set without a clear definition of "test set", so pardon me if I read it wrong. However, I'm still curious if in the case of the test set by my above definition, what is so the point of calculating ROC curve? Isn't the threshold choice made on the training set (which is perhaps heavily optimistically biased) or the validation set (which might be less optimistically biased)? Doesn't the test set become a validation set if we make threshold choice on it?
The procedure that sounds reasonable to me is that we calculate ROC curve on the validation set to do model selection / parameter tuning and threshold selection based on the ROC curve.