1

I am training several classifiers and looking into different ensemble methods on a large dataset (17000 examples and 300+ features).

I want to use grid search and cross validation as granularly as possible in terms of parameter options; however running all of this on my laptop is taking forever and severly limits what I can do.

What are good (and ideally inexpensive) options in terms of using clusters/networks of computers, or anything else that could help significantly speed up those processes?

EDIT: Apologies if this is the wrong place to ask; I figured I would ask the ML community about this.

jeremy radcliff
  • 826
  • 1
  • 6
  • 15
  • 1
    AWS is very affordable. Most options are CPU-based, but GPU instances may be of interest for particular types of models. But in general, grid search is incredibly inefficient, and doesn't make intelligent decisions of how to explore the search space. Answers here address how to do this kind of optimization with minimum computation. http://stats.stackexchange.com/questions/193306/optimization-when-cost-function-slow-to-evaluate/193310#193310 – Sycorax Oct 14 '16 at 04:03
  • @Sycorax, thank you for the link, this is very helpful. – jeremy radcliff Oct 14 '16 at 04:40

0 Answers0