This question is related, but not the same as link. I have read a lot of posts here as well as a post from Frank Harell.
It is very clear to me that Accuracy is not a great metric to use, probability is usually where a statistician/data scientist should stop and a probability cutoff is not a hyperparameter because it's tied with the actual decision not the model itself(you don't have to retrain to model to change this).
Suppose we have a probabilistic prediction model and that we also know the utility function of the decision maker. Should we try to grid-search (or any other search) through hyperparameters and choose a probability cutoff that would maximize the utility function (on the training set to not introduce additional bias ofcourse)? This would lead to choosing the hyperparameters and their corresponding probability cutoff at the same time.
Or should we rather aim to produce a model (with specified hyperparameters) based on some other metrics (proper scoring rules, auroc, ...) and optimize the probability cutoff afterwards? Such a system would be more robust to changes in the utility function (yes that actually happens) but may lead to worse results given a certain utility function, right?
The former however would let us compare both classifiers and probabilistic models (on the validation set when we are picking the final model) directly without converting the classifiers into some sort of a probabilistic models(i.e. Platt's scaling for SVM, number of trees in the majority class for RF, ...).
If one were to use the latter approach (or the utility function isn't known in advance), is it considered best practice to come up with a metric one should optimize (during validation) that's based on general metrics such as AUROC, Brier score,... as well as the assumed utility function (based on limited domain knowledge)?