1

I'm currently getting confused by how to train a model and then to cross validate it.

Many tutorials seem to show that the process is as follows:

  1. Define model e.g model = LogisticRegression()

  2. Split data - X,y

  3. Cross validation CV(model,X,y)

  4. Evaluate how model generalizes.

The thing what is confusing me is that the whole dataset has been used to cross validate, so how do we then tune the hyperparameters since we are not supposed to use CV to measure model performance, we only use it to evaluate model generalization.

My guess is that the tutorials are not showing the whole process and the actual procedure is the following.

  1. Split data into train and 2 test sets.

  2. Train model on train set.

  3. Parameter tune on test set.

  4. Cross validate on 2nd test set.

Is this correct?

If not can you explain how the data should be split and what parts of the process should be trained and tested on which parts of the data. Thanks.

ryan132442
  • 361
  • 4

0 Answers0