Normally in machine learning we will split our data into train, valid and test. The valid data is used to tune the parameters, and the test data is then used to check the performance of our best tuned model. (Watching out for notably different results on valid and test data.)
H2O's AutoML (similar to auto-sklearn, i.e. it is designed to automate finding the best algorithm and automate tuning it) offers me a leaderboard_frame
, and it appears this is doing the same as the test data: it is not being used in either training, or model tuning, but is merely measuring the model performance.
So, should I give my test split as leaderboard_frame
, or should I start splitting my data 4-ways, or should I not use the leaderboard_frame
at all, and take AutoML's best model and evaluate it myself using test
? If I do start passing test
to leaderboard_frame
, are there any extra precautions I should take? Any best practices?