I read a lot of topics about cross validation, GridSearchCV, but I noticed a difference about how using train and test set. Generally, given a certain database, It s split in train and test. Example 100 samples, 70 train and 30 test. Ok, now we define a model like DecisonTree. We train our model on the training set and test its performance on the test set. But, normally a k fold cross validation is applied in order let s to see all the data in the data-set. So in this case, we can use something as
cross_val_score(my model, X, y, cv=5)
This is what normally is done, with a score based on the mean of the 5-fold test-set. But as we know the DecisionTree or what ever model y like, it has some hyper parameters. The smart approach seems to be GridSearchCV to find these hyper parameters. Now, as many guide indicate, we need to divide the database in train and test set. Using the GridSearchCV only on the training set, and then try the model with the best parameters found on the test set. But in that case we are only using to evaluate the performance of the model only a portion of database (test set) because cross validation is not used. So, my question is how we can incorporate GridSearchCV and cross validation together? Someone suggests of using in GridSearchCV on all the database, and then cross_val_score again on all the database. At the end, I want to understand how using k fold cross validation in order to test the model on all the data of the database, and find the best hyperparameter passing through the GridSearchCV ? I hope I was clear.