I have a small dataset that I am trying to train LASSO and random forest models on. I do nested CV to tune hyperparameters and make unbiased performance estimates. The total number of features is greater than the number of observations.
The resulting model performance (as per nested CV AUC) decreases the more features I add in for consideration. What is the likely reason for this? With more candidates to choose from, perhaps each fold of the outer CV builds a model with increasingly varied selections that don't generalize well? LASSO and forward selection seem much more sensitive to this than the random forest.
My models do great if I start with only the univariately significant ones by pearson correlation, but I know I'm really not supposed to do that first outside of the validation... Any suggestions?