0

I'm practicing testing multiple models on the iris dataset with python and I have the following code:

for name, model in models:
    # kfold
    kfold = KFold(n_splits=10, random_state=7, shuffle=True)
    # score
    score = cross_val_score(model, X_test, y_test, cv=kfold, scoring='accuracy')
    # Add results
    results.append(score)
    names.append(name)
    msg = "%s: %f" % (name,score.mean())
    print(msg)

Every time I run this I get slightly different results for the score.mean() can anyone explain to me why this happens and how to reduce the variability of the results.

I'm new to machine learning so any help would be great. Thanks.

  • Isn't your cross-validation *supposed* to be random and doesn't randomness imply there will be variation from one instance to another? – whuber Apr 22 '21 at 17:00
  • Yes, so is there a way to reduce the variability? Or do I need to take the result multiple times and calculate an average? – Hubert Rzeminski Apr 22 '21 at 17:05
  • 1
    See https://stats.stackexchange.com/questions/86522. I believe https://stats.stackexchange.com/questions/444544 may answer the question you posted. https://stats.stackexchange.com/questions/365938 might also be relevant. – whuber Apr 22 '21 at 17:08
  • You could just set your random seed. In R, that would be `set.seed(7*11*13)`, and there must be some way in Python. – kjetil b halvorsen Apr 23 '21 at 13:49

0 Answers0