I'm trying to compare a model (random forest) trained on two sets of features. My goal is to compare the performance of the model when I use one set of features vs the other. I only have about 120 samples. The first set of features is about 15 and the second about 8.
I'm using a Monte-Carlo cross validation procedure where I randomly divide the dataset into train and test set 100 times.
I then take the average performances on the test set over the 100 splits for each of the sets of features and compare these averages (using a paired t-test) to see if they are significantly different from each other.
Should I perform hyperparameter optimization in each of the 100 splits, for each of the two sets of features or is it ok to stick to default parameters?
Thank you