I would like to perform a feature selection (using Boruta) but read that if I want to do it correctly I need to do it with inner CV. Here what I did:
First split the data into test and train and next run feature selection on train only. Is that the correct way to do it?
set.seed(1234)
splitIndex <- createDataPartition(data$outcome, p = .75, list = FALSE, times = 1)
train <- data[ splitIndex,]
test <- data[-splitIndex,]
Boruta part:
set.seed(123)
boruta_output <- Boruta(outcome ~ ., data=train, doTrace=2,maxRuns= 500, ntree=1000)
# Get significant variables including tentatives
boruta_signif <- getSelectedAttributes(boruta_output, withTentative = TRUE)
# Do a tentative rough fix
roughFixMod <- TentativeRoughFix(boruta_output)
boruta_signif <- getSelectedAttributes(roughFixMod)
print(boruta_signif)
# Variable Importance Scores
imps <- attStats(roughFixMod)
imps2 = imps[imps$decision != 'Rejected', c('meanImp', 'decision')]
head(imps2[order(-imps2$meanImp), ])
Thank you for all your suggestions and help!