Questions tagged [log-loss]
54 questions
29
votes
4 answers
What's considered a good log loss?
I'm trying to better understand log loss and how it works but one thing I can't seem to find is putting the log loss number into some sort of context. If my model has a log loss of 0.5, is that good? What's considered a good and bad score? How do…

user1923975
- 455
- 1
- 5
- 9
19
votes
2 answers
optimizing auc vs logloss in binary classification problems
I am performing a binary classification task where the outcome probability is fair low (aroung 3%). I am trying to decide whether to optimize by AUC or log-loss. As much as I have understood, AUC maximizes the model's ability to discriminate between…

Giorgio Spedicato
- 3,444
- 4
- 29
- 39
17
votes
1 answer
logloss vs gini/auc
I've trained two models (binary classifiers using h2o AutoML) and I want to select one to use. I have the following results:
model_id auc logloss logloss_train logloss_valid gini_train gini_valid
DL_grid_1 0.542694 0.287469…

Dan
- 1,288
- 2
- 12
- 30
12
votes
2 answers
Reference for log-loss (cross-entropy)?
I'm trying to track down the original reference for the logarithmic loss (logarithmic scoring rule, cross-entropy), usually defined as:
$$L_{log}=y_{true} \log(p) + (1-y_{true}) \log(1-p)$$
For the Brier score for example there is the Brier (1950)…

Gabriel
- 3,072
- 1
- 22
- 49
11
votes
4 answers
Why is binary cross entropy (or log loss) used in autoencoders for non-binary data
I am working on an autoencoder for non-binary data ranging in [0,1] and while I was exploring existing solutions I noticed that many people (e.g., the keras tutorial on autoencoders, this guy) use binary cross-entropy as the loss function in this…

Flek
- 163
- 1
- 9
6
votes
2 answers
Calculate binomial deviance (binomial log-likelihood) in the test dataset
I'm predicting probabilities $\mathbb{P}(Y=1)$ using a probability forest (ranger in R). I want to evaluate my predictions $\hat p_i$ in a test dataset by calculating average binomial deviance (log-likelihood). I believe the formula…

user116514
- 81
- 1
- 4
5
votes
2 answers
"Dumb" log-loss for a binary classifier
I am trying to understand how I can best compare a classifier that I have trained and tuned against a "dumb" classifier, particularly in the context of binary classification with imbalanced classes.
Here's a summary of my experiment: suppose I have…

wissam124
- 53
- 6
5
votes
1 answer
Pytorch Cross Entropy Loss implementation counterintuitive
there is something I don't understand in the PyTorch implementation of Cross Entropy Loss.
As far as I understand, theoretical Cross Entropy Loss is taking log-softmax probabilities and output a real that should be closer to zero as the output is…
5
votes
1 answer
Log Loss function in scikit-learn returns different values
I have been trying to wrap my head around the log loss function for model evaluation. I understand how the value is calculated after doing the math by hand.
In the python module sklearn.metrics the log_loss function returns two different values…

GeneticsGuy
- 63
- 1
- 1
- 5
5
votes
1 answer
Smoothing/shrinking the predicted probability of a classifier to reduce live logloss
Let us assume we work on a 2 -class classification problem. In my setting the sample is balanced. To be precise it is a financial markets setting where up and down have approximately 50:50 chance.
The classifier produces results $$p_i = P[class =…

Richi W
- 3,216
- 3
- 30
- 53
4
votes
2 answers
Logarithmic loss vs Brier score vs AUC score
I have a dataset with two classes of elements. I also have two methods which assign (complementary) probabilities to each element in the dataset of belonging to either class.
Given that I work with probabilities (instead of hard 0,1 classification…

Gabriel
- 3,072
- 1
- 22
- 49
4
votes
1 answer
Is there a cross-entropy-like loss function for multiple classes where misclassification costs are not identical?
For this conversation I'll use the below definition of cross-entropy where there are N samples, M different classes, $ y_{ij} $ is 1 if sample i is of class j and 0 otherwise and $p_{ij}$ is the probability that a sample i is of class j as assigned…

the_martian
- 43
- 7
4
votes
2 answers
How does the L2 regularization penalize the high-value weights
I am reading about regularization in machine learning model. I want to understand how mathematically the L2 term penalizes the high-value weights to avoid overfitting? Any explanation?

BetterEnglish
- 523
- 1
- 6
- 16
3
votes
0 answers
derivative of cross entropy yields log-odds, does that make sense?
I am looking for a proof how to derive the logistic regression from cross-entropy loss, i.e. derive the form of a sigmoid from cross entropy. my thoughts are these:
$\ell = y_i \ln{p_i} + (1-y_i)\ln{(1-p_i)}$
$\frac{\partial \ell}{\partial y_i} =…

pikachu
- 731
- 2
- 10
2
votes
2 answers
Understanding cross entropy loss
The formula for cross entropy loss is this:
$$-\sum_iy_i \ln\left(\hat{y}_i\right).$$
My question is, what is the minimum and maximum value for cross entropy loss, given that there is a negative sign in front of the sum?
For example: let's say…

Bharathi A
- 23
- 3