When multiple c,g hyper-parameter values yield the same accuracy, how can I choose which c,g is better?

Question

If multiple [cost, gamma] values yield identical or extremely similar cross-validated performance (regardless of the specific metric - ACC, F1, MCC), then what is the technique which should be used to select which [cost, gamma] to train with?

I have seen cases where equal (and also approximately equal) performance results can be gained from various different [cost, gamma] values which are contradictory (for example, being very low, or very high) found in different areas during a grid search.

Should I take the average of [cost, gamma] over the range where performance is identical? Otherwise, is it better to choose a low cost and a high gamma, for example?

Please note, this question is not a duplicate (as suggested in the comments):

(1) This question is about a data set which is balanced. Therefore, accuracy is a reasonable measure of performance; any value over 50% means that the result is better than random; still, this question is not specific to accuracy... that is just one possible metric, see (2).

(2) This question is not about whether or not accuracy is a scientifically valid metric. This question applies to any model performance evaluation metric; it could be any measure. If you have several equal c,g points of measure x (be it ACC ROC, AUC PR, F1, MCC, or any other metric,) how can you select which c,g is more appropriate?

On the principle of parsimony, choose $C$ to be smaller. I assume $g$ is the rbf lengthscale parameter; pick it to be larger. But accuracy is incredibly deceptive, even if your data are balanced, since it gives no distinction between marginal and confident decisions. For more information, please search our archives and review https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models?noredirect=1&lq=1 — Sycorax, Mar 26 '18 at 05:09
It has been shown that accuracy and F-score are correlated (at least when doing grid-search), so it is a bit contradictory. Moreover, probability is only estimated for svm, and in general, you can find that many instances are very near the threshold border for close distributions. For svm, it doesn't appear to make much sense to alter the threshold, (the point of svm, is to set the threshold,) especially since it may not generalise outside of the test set. — Aalawlx, Mar 26 '18 at 05:15
Possible duplicate of [Why is accuracy not the best measure for assessing classification models?](https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models) — kjetil b halvorsen, Mar 26 '18 at 09:27

When multiple c,g hyper-parameter values yield the same accuracy, how can I choose which c,g is better?

0 Answers0