Should I try to estimate logistic regression by F1 maximization, rather than Liklihood?

Question

Assume I have a dataset of covariates $x_i$ and binary outcomes $y_i \in \{0,1\}$. I want to predict outcome for unknown a unknown $y_k$ given $y$.

Quite common is to do this with logistic regression, so that $P(y_i = 1 | x) = \frac{1}{1+e^{-\beta'x + \epsilon}}$ and $\epsilon ~ N(0,\sigma^2)$. The $\beta$ vector is usually found via Maximum Likliehood. That is well described all over Internet. You can then predict outcomes by defining some cutoff $c$, and decide that the predicted outcome $\hat y = 1$ if $P(y = 1|x) > c$, and $\hat y = 0$ otherwise. It is customary to choose $c = 0.5$.

To evaluate the model, and compare it to other models, one does not use the Likliehood. Instead, one quite often use the F1 score instead. One can adjust the cutoff $c$ to make a tradeoff between precision and recall in the predictor.

It seems to me that one should use a Maximum F1-estimator instead, fitting both $\beta$ and $c$. I have not found that approach anywhere online. Should I try to use that method for fitting a binary classifier?

(1) is is *much* harder to optimize F1 score then likelihood, (2) predicted probabilities give you much richer information then the classifications. Check https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models/312787#312787 and https://stats.stackexchange.com/questions/381643/scale-dummy-variables-in-logistic-regression and questions tagged as https://stats.stackexchange.com/questions/tagged/scoring-rules — Tim, Dec 12 '18 at 16:41
I think you may find this Kaggle kernel very usefull https://www.kaggle.com/rejpalcz/best-loss-function-for-f1-score-metric — xboard, Dec 12 '18 at 16:45

Should I try to estimate logistic regression by F1 maximization, rather than Liklihood?

0 Answers0