3

I am using Random Forest and Stochastic Gradient Boosting to predict a categorical target variable exhibiting severe between-class imbalance. I am using oversampling to make sure that the models do not neglect the minority classes during training and reach decent test-set accuracy for all classes.

As recommended by the literature (e.g., Sun et al. (2007)), I am considering the oversampling proportions as hyperparameters requiring optimization. Finding the best number of observations to sample from each class translates to an optimization problem in a 6-dimensional (pseudo)-continuous space (I have 6 classes).

So, far, I have used manual trial and error: I initially sample the same number of observations from each class, then assess error rates for each category using cross-validation, and increase/decrease the number of observations drawn from each class until the error is equally shared among all categories.

But I would like to know if it's possible to automate this optimization procedure. Should I use as an optimizer gradient descent/ascent, or a genetic algorithm?

An exhaustive grid search would be way too compute intensive, even with a rough grid, and I know simulated annealing is usually only used in discrete spaces.

Could someone familiar with similar optimization problems please point me towards a good solution? I am not very used to optimizers different from grid search, and I don't want to be reinventing the wheel.

Antoine
  • 5,740
  • 7
  • 29
  • 53
  • Not entirely sure what you are asking. Are you looking for optimization algorithms for the training problems (e.g. QP for SVM), or are you looking for hyperparameter optimization methods (e.g. optimizing C and kernel parameters for SVM)? Grid search and gradient descent, for instance, are never both viable options for a certain problem. – Marc Claesen Aug 04 '15 at 15:47
  • Yes, my question deals with hyperparameter optimization algorithms – Antoine Aug 04 '15 at 20:32
  • edited question to make it clear – Antoine Aug 04 '15 at 20:41
  • 1
    I think this question is far too broad: it's basically asking to compare and contrast the relative merits of 5 (or more!) optimization methods. A typical graduate-level applied math curriculum would teach an entire course on this topic! – Sycorax Aug 04 '15 at 22:00
  • 1
    Oh, and it's not just comparing/contrasting five algorithms, but comparing and contrasting them for SVM, RF and any other ML classifier, out of concern that decent optimizers for one ML algorithm might be suboptimal for another. – Sycorax Aug 04 '15 at 22:07
  • @user777 Basically every question on CV could be the subject of an entire course. Some questions require the experience of a lifetime to answer. So you first point is not valid. A question that is too broad is one for which an answer of reasonable length cannot be given. In my case, a sentence or two (similar to what I wrote about grid search), maybe with some useful links, would suffice to highlight the main merits/drawbacks of each optimizer from a practitioner's standpoint. So my question clearly does not fall into the "too broad" category. – Antoine Aug 05 '15 at 09:04
  • @user777 Also, please note that I did not explicitly ask to compare the optimizers for SVM, RF, Boosting, and others. I cited these learning algorithms to give some context (i.e., make it clear that my question was about ML hyperparameter optimization). – Antoine Aug 05 '15 at 09:05
  • I edited my question. Could someone please tell me what I should do to have it re-opened? – Antoine Sep 06 '15 at 14:17
  • Have a look at http://docs.optunity.net. – Marc Claesen Dec 04 '15 at 08:46
  • There are separate optimizers for hyper parameters. Look for hyperopt or spearmint in Python for example. – spdrnl Dec 04 '15 at 10:38
  • The suggestions here seem relevant. https://stats.stackexchange.com/questions/193306/optimization-when-cost-function-slow-to-evaluate/193310#193310 – Sycorax Jul 03 '18 at 02:03

0 Answers0