I had a query about Data set splitting.
Say, I have a data set and I split them into 3 different sets - Training Set, Validation Set and Test Set. I will use the Training Set and Validation set to go over different algorithms and choose the best performing one (Based on validation Set accuracy and all)
Now, I am convinced that a particular algo (model) with certain parameters does well (Since I have validated them on my validation set).
I finally take that algo (model) selected and run the Test set. Here are the questions -
- Is this the accuracy (Test Set Accuracy) I need to report?
- What if it performs really bad on test set? What do I do next?
- If I re-work the whole process wouldn't it be like using the Test set for choosing an Algo (model)?
- Ideally after Test set is applied I shouldn't be going back to whiteboard for new algo selection/ tuning?
Appreciate all the time.