I have been using GridSearchCV to tune the hyperparameters of three different models. Through hyperparameter tuning I have gotten AUC's of 0.65 (Model A), 0.74 (Model B), and 0.77 (Model C).
However when I return the "best_score_" for each grid search I am getting the scores of 0.72 (Model A), 0.68 (Model B), and 0.71 (Model C).
I am confused about why these scores are noticeably different, for example Model A has the weakest AUC but the strongest "best_score". Is this ok? Does this mean that more tuning likely needs to be done?
Thanks!