0

I understood the purpose behind CV (i.e. just to make sure the data is spread well across the folds, and the skewness is somewhat averaged-out).

Let's take this example -

cv_cart_Scores=[]
for k in myList:
    cart = DecisionTreeClassifier(random_state = 777, min_samples_leaf=k)
    cart_scores = model_selection.cross_val_score(cart, x_train, y_train, cv=10, scoring='accuracy')
    cv_cart_scores.append(cart_scores.mean())

Here, cv_scores stores the accuracy score of each fold in the CV. And later we mean it in next line. I'm not getting:

  1. What is the purpose of cv_cart_scores.
  2. What is the purpose of looping it myList number of times?
  3. Shall we take cv_cart_scores.mean() at last to get the final score of the model?

RB (on Python 3.6)

Ferdi
  • 4,882
  • 7
  • 42
  • 62
ranit.b
  • 253
  • 3
  • 9
  • 1
    Since you say that you are familiar with cross-validation, your question seems to boil down to ["how to choose between hold out vs k-fold cross-validation"](https://stats.stackexchange.com/questions/104713/hold-out-validation-vs-cross-validation), am I correct? – Tim Jan 25 '19 at 13:30

1 Answers1

1

The code snippet you posted appears to be a procedure using 10-fold CV to select a value for the min_samples_leaf hyperparameter.

  1. cv_cart_scores is storing the CV scores for each setting of min_samples_leaf

  2. myList appears to be a list that stores different possible values for min_samples_leaf that this code would like to test. The goal is to find the value that gives the best CV performance.

  3. No. You will use the index value of the highest accuracy score in the cv_cart_scores list to tell you which index of myList stores the best value for the min_samples_leaf hyperparameter.

BazookaDave
  • 136
  • 3
  • Oh yes! Do you mean the code snippet above is a step towards hyperparameter tuning? I liked your explanation though. Thanks. – ranit.b Jan 25 '19 at 13:43
  • Thank you! Yes. It is using 10-fold CV to do hyperparameter tuning for the `min_samples_leaf` hyperparameter. – BazookaDave Jan 25 '19 at 13:44