12

I'm trying to track down the original reference for the logarithmic loss (logarithmic scoring rule, cross-entropy), usually defined as:

$$L_{log}=y_{true} \log(p) + (1-y_{true}) \log(1-p)$$

For the Brier score for example there is the Brier (1950) article. Is there such a reference for the log-loss?

Sycorax
  • 76,417
  • 20
  • 189
  • 313
Gabriel
  • 3,072
  • 1
  • 22
  • 49
  • shouldn't cross-entropy have two different distributions, not the same ($p$)? – develarist Oct 27 '20 at 19:36
  • 1
    The metric is defined in full in the link: https://scikit-learn.org/stable/modules/model_evaluation.html#log-loss – Gabriel Oct 27 '20 at 19:38
  • Closely related: https://stats.stackexchange.com/questions/31985/definition-and-origin-of-cross-entropy – Sycorax Oct 27 '20 at 21:18
  • @Sycorax: Does it not qualify as a plain duplicate? I feel like I might be missing the difference if there is any. – user541686 Oct 28 '20 at 07:27
  • @user541686 My answer writes about a slim distinction that I perceive between the two. I can see a case made for either closing as a duplicate or letting this more specific variation on the theme stand on its own. – Sycorax Oct 28 '20 at 14:14
  • @Sycorax: Oh I see, thanks! Up to you I guess, I'm not sure. – user541686 Oct 28 '20 at 14:35
  • @user541686 The other aspect is that since I have an answer here, it might appear self-interested in some sense to close this thread (other SO communities have had disputes and hurt feelings about this exact scenario, and I wish to avoid that). – Sycorax Oct 28 '20 at 14:39

2 Answers2

12

The earliest I have been able to find is

Good, I. J. “Rational Decisions.” Journal of the Royal Statistical Society. Series B (Methodological), vol. 14, no. 1, 1952, pp. 107–114. JSTOR, www.jstor.org/stable/2984087

Look at section 8, "Fair Fees":

By itself $\log p_1$ (or $\log(1 - p_1)$) is a measure of the merit of a probability estimate

I found this reference in Gneiting & Raftery (2007, JASA), who write that "This scoring rule dates back at least to Good (1952)", suggesting that they already did a similar search for original sources.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • Selecting this answer as it adheres more closely to the modern definition of log-loss. Thank you! – Gabriel Oct 27 '20 at 20:41
10

If we view minimizing cross entropy as equivalent to maximizing the log-likelihood of the same model, then I believe we can go as far back as RA Fisher. This places the date between 1912 and 1922, depending on how well-developed you wish the theory to be; see discussion in John Aldrich "R. A. Fisher and the Making of Maximum Likelihood 1912 – 1922" Statistical Science. 1997, Vol. 12, No. 3, 162-176

We also have some related threads:

which use the term "cross entropy" in the broad sense of a family of probabilistic losses, instead of the sense used in this post, as jargon for a specific loss for a model of binary data.

Sycorax
  • 76,417
  • 20
  • 189
  • 313