I am using Random Forest and Stochastic Gradient Boosting to predict a categorical target variable exhibiting severe between-class imbalance. I am using oversampling to make sure that the models do not neglect the minority classes during training and reach decent test-set accuracy for all classes.
As recommended by the literature (e.g., Sun et al. (2007)), I am considering the oversampling proportions as hyperparameters requiring optimization. Finding the best number of observations to sample from each class translates to an optimization problem in a 6-dimensional (pseudo)-continuous space (I have 6 classes).
So, far, I have used manual trial and error: I initially sample the same number of observations from each class, then assess error rates for each category using cross-validation, and increase/decrease the number of observations drawn from each class until the error is equally shared among all categories.
But I would like to know if it's possible to automate this optimization procedure. Should I use as an optimizer gradient descent/ascent, or a genetic algorithm?
An exhaustive grid search would be way too compute intensive, even with a rough grid, and I know simulated annealing is usually only used in discrete spaces.
Could someone familiar with similar optimization problems please point me towards a good solution? I am not very used to optimizers different from grid search, and I don't want to be reinventing the wheel.