1

I am using the caret package in R with the 'C5.0' train method. I am trying to implement kfold cross validation but it is taking too much time to build the model. How can I adjust my parameters so that it takes less time? My train data has 30,000 samples.

#My code
train_control <- trainControl(method="repeatedcv", number=10, repeats=3)

c50Grid <- expand.grid(.trials = c(1:9, (1:10)*10),
                       .model = c("tree", "rules"),
                       .winnow = c(TRUE, FALSE))

c5Fitvac <- train(y ~ .,
                  data = trainV,
                  method = "C5.0",
                  tuneGrid = c50Grid,
                  trControl = train_control,
                  metric = "Accuracy", 
                  importance=TRUE, 
                  preProc = c("center", "scale")) 
Firebug
  • 15,262
  • 5
  • 60
  • 127
Umesh Nathani
  • 11
  • 1
  • 2
  • Welcome to CrossValidated. Notice I edited your post to highlight code formatting. When you say it's "taking too much time", how much is too much? Is it taking hours/days? Also, how many features you have? How many are categorical and how many categories do these have? – Firebug Sep 18 '16 at 22:35
  • 1
    Notice your grid has 76 parameter combinations, you are doing 10-fold CV with 3 repeats. That's a total of 2280 evaluations. – Firebug Sep 18 '16 at 22:39
  • so my last attempt took a few hours . I am trying another run and it has been running for an hour. I have 16 features, 8 are categorical with each between 2-4 categories. How do you get 76 evaluations? Sorry I know this is a silly quesiton – Umesh Nathani Sep 18 '16 at 22:54
  • sorry, I meant 76 combinations – Umesh Nathani Sep 18 '16 at 23:12
  • 19 entries in `trials`, 2 in `model`, 2 in `winnow`. – Firebug Sep 19 '16 at 02:55

1 Answers1

0

To complement @Firebug comment, note that you also use repeated cross-validation (3 times) which slows down the whole process.

(See : In caret what is the real difference between cv and repeatedcv? for more information about repeated CV)

I would go for a coarser grid (you do not need 19 different values of trials), and would use a simple cross validation.

RUser4512
  • 9,226
  • 5
  • 29
  • 59
  • Thank you all, this really helped. I used much fewer combinations with simple cross validation and got decent results. I am now stuck with neuralnet and creating another post. Thanks much! – Umesh Nathani Sep 21 '16 at 00:29