1

I am looking at the AUCs of three RandomForestClassifier models. Before this, I split my data using test and train using a random_state.

When I change the random_state for the data split, the AUCs changes. Is this supposed to happen or does it mean my model most likely needs parameter tuning?

Here are the AUCs using a random_state of 1 and then a random_state of 2.

Model A. 0.76 -> 0.71 Model B. 0.73 -> 0.69 Model C. 0.57 -> 0.58

Thanks.

Joe
  • 89
  • 3

2 Answers2

4

Without knowing specifically how random state 1 and 2 differ, what's likely happening is that different objects (records, subjects, animals, cars, etc) are being selected for training the the RF classifier. Once training is done, the out-of-bag (OOB) objects that are not in each bootstrap (for training each tree), are "dropped" down the trained tree to determine class purity after the objects make there way through the nodes.

If the objects used for training and testing change, the AUC can be drastically different. In addition, there are a variety of CV methods, which I posted here, that will definitely impact AUC.

1

YES

When you have one random state, you select certain data for training, develop a model, and then test the model on a holdout data set.

When you change the random state, you select different data for training, develop a different mode, and then test this other mode on a different holdout data set.

I would expect different models that are tested on different data sets to have different performance.

Dave
  • 28,473
  • 4
  • 52
  • 104