1

I would like to perform a feature selection (using Boruta) but read that if I want to do it correctly I need to do it with inner CV. Here what I did:

First split the data into test and train and next run feature selection on train only. Is that the correct way to do it?

set.seed(1234)
splitIndex <- createDataPartition(data$outcome, p = .75, list = FALSE, times = 1)
train <- data[ splitIndex,]
test  <- data[-splitIndex,]

Boruta part:

set.seed(123)
boruta_output <- Boruta(outcome ~ ., data=train, doTrace=2,maxRuns= 500, ntree=1000)  
# Get significant variables including tentatives
boruta_signif <- getSelectedAttributes(boruta_output, withTentative = TRUE)
# Do a tentative rough fix
roughFixMod <- TentativeRoughFix(boruta_output)
boruta_signif <- getSelectedAttributes(roughFixMod)
print(boruta_signif)
# Variable Importance Scores
imps <- attStats(roughFixMod)
imps2 = imps[imps$decision != 'Rejected', c('meanImp', 'decision')]
head(imps2[order(-imps2$meanImp), ])

Thank you for all your suggestions and help!

Georg
  • 11
  • 1
  • Does this answer your question? [Feature selection and cross-validation](https://stats.stackexchange.com/questions/27750/feature-selection-and-cross-validation) – msuzen Jan 10 '22 at 16:05

0 Answers0