Reference for log-loss (cross-entropy)?

Question

I'm trying to track down the original reference for the logarithmic loss (logarithmic scoring rule, cross-entropy), usually defined as:

$$L_{log}=y_{true} \log(p) + (1-y_{true}) \log(1-p)$$

For the Brier score for example there is the Brier (1950) article. Is there such a reference for the log-loss?

shouldn't cross-entropy have two different distributions, not the same ($p$)? — develarist, Oct 27 '20 at 19:36
The metric is defined in full in the link: https://scikit-learn.org/stable/modules/model_evaluation.html#log-loss — Gabriel, Oct 27 '20 at 19:38
Closely related: https://stats.stackexchange.com/questions/31985/definition-and-origin-of-cross-entropy — Sycorax, Oct 27 '20 at 21:18
@Sycorax: Does it not qualify as a plain duplicate? I feel like I might be missing the difference if there is any. — user541686, Oct 28 '20 at 07:27
@user541686 My answer writes about a slim distinction that I perceive between the two. I can see a case made for either closing as a duplicate or letting this more specific variation on the theme stand on its own. — Sycorax, Oct 28 '20 at 14:14
@Sycorax: Oh I see, thanks! Up to you I guess, I'm not sure. — user541686, Oct 28 '20 at 14:35
@user541686 The other aspect is that since I have an answer here, it might appear self-interested in some sense to close this thread (other SO communities have had disputes and hurt feelings about this exact scenario, and I wish to avoid that). — Sycorax, Oct 28 '20 at 14:39

Stephan Kolassa · Accepted Answer · 2020-10-28T07:33:54.893

12

The earliest I have been able to find is

Good, I. J. “Rational Decisions.” Journal of the Royal Statistical Society. Series B (Methodological), vol. 14, no. 1, 1952, pp. 107–114. JSTOR, www.jstor.org/stable/2984087

Look at section 8, "Fair Fees":

By itself $\log p_1$ (or $\log(1 - p_1)$) is a measure of the merit of a probability estimate

I found this reference in Gneiting & Raftery (2007, JASA), who write that "This scoring rule dates back at least to Good (1952)", suggesting that they already did a similar search for original sources.

edited Oct 28 '20 at 07:33

answered Oct 27 '20 at 19:51

Stephan Kolassa

95,027
13
197
357

Selecting this answer as it adheres more closely to the modern definition of log-loss. Thank you! – Gabriel Oct 27 '20 at 20:41

Sycorax · Answer 2 · 2021-12-16T04:12:44.330

If we view minimizing cross entropy as equivalent to maximizing the log-likelihood of the same model, then I believe we can go as far back as RA Fisher. This places the date between 1912 and 1922, depending on how well-developed you wish the theory to be; see discussion in John Aldrich "R. A. Fisher and the Making of Maximum Likelihood 1912 – 1922" Statistical Science. 1997, Vol. 12, No. 3, 162-176

We also have some related threads:

which use the term "cross entropy" in the broad sense of a family of probabilistic losses, instead of the sense used in this post, as jargon for a specific loss for a model of binary data.

Reference for log-loss (cross-entropy)?

2 Answers2

Linked