When trying to predict data using linear regression or classify with logistic regression, with a polynomial, I know how to find the best degree of a polynomial to fits given data when the regularization coefficient is fixed. I also know how to find the best regularization coefficient when the degree of the polynomial is fixed.
What I want to know is how to find the best model when none of these parameters are known.
- Should I find the best degree without regularization first, then the regularization parameter ?
- Should I, for every degree, train with every possible regularization parameter value (assuming it belongs to an ensemble of discrete values), and then pick the combination degree/regularization that had the best results on the validation set ?
Or is there a better solution to find these hyperparameters ?