1

I'm trying to compare a model (random forest) trained on two sets of features. My goal is to compare the performance of the model when I use one set of features vs the other. I only have about 120 samples. The first set of features is about 15 and the second about 8.

I'm using a Monte-Carlo cross validation procedure where I randomly divide the dataset into train and test set 100 times.

I then take the average performances on the test set over the 100 splits for each of the sets of features and compare these averages (using a paired t-test) to see if they are significantly different from each other.

Should I perform hyperparameter optimization in each of the 100 splits, for each of the two sets of features or is it ok to stick to default parameters?

Thank you

asere
  • 63
  • 4
  • How many observations do you have? The more you have, the more it makes sense to do hyperparameter optimisation. Also depending on the background you may have a look at how well it goes without optimisation and whether the achieved quality is good enough for you. Not sure what you want to achieve with the t-tests by the way. – Christian Hennig Nov 25 '20 at 15:28
  • I only have about 120 samples. Does it mean I do not need to do hyperparameter optimization? – asere Nov 25 '20 at 15:39
  • In regards to the t-test, my goal is to compare the performance of the model when I use one set of features vs the other. Imagine I get an average (over the 100 splits) accuracy of 75% using the first set of features and and accuracy of 78% using the other set of features. Although 78% is higher than 75%, is this difference significant? That's why I used a paired t-test (I've added this to the question to clarify) – asere Nov 25 '20 at 15:41
  • Comparing accuracy is suboptimal (see: [Why is accuracy not the best measure for assessing classification models?](https://stats.stackexchange.com/q/312780/)). That said, if you are going to, you should not use a t-test; you want to use McNemar's test (see: [Compare classification performance of two heuristics](https://stats.stackexchange.com/q/185450/7290)). – gung - Reinstate Monica Nov 25 '20 at 15:56
  • @asere: I'd assume that you'd like to have an as good result as you can have, and surely I can't tell you what you "need" to do. I can only say that with 120 observations I probably wouldn't do it myself, unless there are good reasons to mistrust the defaults in the specific situation (in which case, also depending on the situation, I may just choose some other parameter values without optimising). Also, as I said before, it may depend on whether what you achieve with the defaults is satisfactory for you. – Christian Hennig Nov 25 '20 at 16:02
  • I don’t agree with the notion that a small number of records means you don’t have to optimize hyper parameters. In fact quite the opposite, a small amount of records means optimizing hyper-parameters is less computationally expensive and so I’d be more inclined to do it not less. – astel Nov 25 '20 at 16:05
  • @astel: The problem is that with few observations the assessment of the hyperparameters will not be very precise, and the probability of getting it wrong by random accident may be rather large. – Christian Hennig Nov 25 '20 at 16:07
  • So there is a probability you may get it wrong, while if you use the default you almost certainly get it wrong. – astel Nov 25 '20 at 16:11
  • @Lewian I'm certainly interested in obtaining the highest performance I can get, but I'm mostly interested in knowing if one the set of features performs better than the other set of features. So I'm questioning if hyperparameter optimization is needed in this situation and how "fair" it would be to be comparing these models when I actually had to do hyperparameter optimization 100 times (for both sets of features) and I obtained 100 different "optimal" sets of hyperparameters – asere Nov 25 '20 at 16:22
  • @astel: Using hyperparameter optimisation you may get better or worse than using the defaults (obviously depending on the quality of the defaults), and I have seen a number of problems with small number of observations in which things got worse. Hardly ever with more observations. – Christian Hennig Nov 25 '20 at 17:02

1 Answers1

1

Yes you should perform hyper-parameter optimization even if your only concern is which set of features performs better. By just using the default values you are only getting information that one set of features performs better than the other only on one specific set of hyper-parameters. What if that set of hyper-parameters is the worst possible performing set? Then all you know is that feature set A is better than feature set B on a very bad model. Maybe in reality feature set B outperforms feature set A on the optimal set of hyper-parameters, wouldn’t that be more useful to know?

astel
  • 1,388
  • 5
  • 17
  • thank you, I think that makes sense and doing hyperparameter optimization was my first instinct as well. However, as I mentioned in a previous comment, because I am doing monte-carlo cross-validation (and therefore, not just 1 split) I was concerned about the "fairness" of comparing 100 models, each with their own set of optimal hyperparameters – asere Nov 25 '20 at 16:26