4

Is there a name for a logistic regression model that has been fit using the Brier score (or equivalently the mean-squared error) rather than the cross-entropy?

I realise this isn't maximum-likelihood, but the model would still asymptotically estimate the conditional mean of the targets (which would be the posterior probability of class membership for binary targets) and similar things were done using neural networks back in the day. I haven't been able to find anything on this, but perhaps that is because I am using the wrong terms?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
  • 1
    I'd call this logistic regression with a different/non-standard loss function. Like how lasso/ridge regression are just linear regression under a different loss function – jcken Aug 17 '21 at 08:51
  • @jcken I'm trying to find references for it, do you know of any? – Dikran Marsupial Aug 17 '21 at 08:53
  • 1
    https://www.jclinepi.com/article/S0895-4356(09)00363-1/fulltext – jcken Aug 17 '21 at 08:57
  • 1
    @jcken I am more after fitting models using the Brier score (or MSE) rather than assessing fitted models. – Dikran Marsupial Aug 17 '21 at 09:01
  • @jcken, I do not think the comparison to penalized regression is relevant. This is more like LAD regression instead of OLS regression. – Richard Hardy Aug 17 '21 at 10:19
  • 1
    The name I would give it is _silly_. There are only 3 efficient optimality criteria that yield asymptotically unbiased parameter estimates: log likelihood, penalized log likelihood, and Bayesian posterior. Maximum likelihood estimation is not broken; don't fix it. – Frank Harrell Aug 17 '21 at 12:20
  • 1
    @FrankHarrell my subjective prior on the utility of idea is low, but I'd still be interested to hear of previous uses outside Neural Networks, where I have already seen it. – Dikran Marsupial Aug 17 '21 at 12:57
  • Some ideas are better left unstudied :-) – Frank Harrell Aug 17 '21 at 14:57
  • 4
    sorry, this isn't particularly helpful. One of the problems with crossentropy is that it can be very sensitive to outliers or label noise (as it *very* harshly penalises confident mistakes). It is possible that minimising the Brier score might be more robust - it is also a proper scoring rule, so I don't think it is "silly". It is worth keeping an open mind about these things, and that perhaps other people know things I don't. – Dikran Marsupial Aug 17 '21 at 15:09
  • With binary Y all observations are outliers and it is the sensitivity to outliers that makes maximum likelihood estimation efficient and leads to accurate probability estimates. – Frank Harrell Aug 17 '21 at 21:30
  • 1
    Possible duplicate: https://stats.stackexchange.com/questions/326350/what-is-happening-here-when-i-use-squared-loss-in-logistic-regression-setting – kjetil b halvorsen Aug 18 '21 at 00:43
  • @FrankHarrell I think we have different definitions of an "outlier". Mine would be an observation that cannot be plausibly explained by a model that otherwise provides a good fit to the data. Binary Y observations are not outliers if a model that gives accurate probability estimates explains them with reasonable likelihood. – Dikran Marsupial Aug 18 '21 at 06:55
  • 1
    @kjetilbhalvorsen thank you for the link - it isn't quite the same question as I am really looking for references/uses outside of neural nets, but the first answer provides a *much* better reason why it probably hasn't been used outside NNs! – Dikran Marsupial Aug 18 '21 at 07:04

0 Answers0