Questions tagged [log-loss]

54 questions
29
votes
4 answers

What's considered a good log loss?

I'm trying to better understand log loss and how it works but one thing I can't seem to find is putting the log loss number into some sort of context. If my model has a log loss of 0.5, is that good? What's considered a good and bad score? How do…
user1923975
  • 455
  • 1
  • 5
  • 9
19
votes
2 answers

optimizing auc vs logloss in binary classification problems

I am performing a binary classification task where the outcome probability is fair low (aroung 3%). I am trying to decide whether to optimize by AUC or log-loss. As much as I have understood, AUC maximizes the model's ability to discriminate between…
Giorgio Spedicato
  • 3,444
  • 4
  • 29
  • 39
17
votes
1 answer

logloss vs gini/auc

I've trained two models (binary classifiers using h2o AutoML) and I want to select one to use. I have the following results: model_id auc logloss logloss_train logloss_valid gini_train gini_valid DL_grid_1 0.542694 0.287469…
Dan
  • 1,288
  • 2
  • 12
  • 30
12
votes
2 answers

Reference for log-loss (cross-entropy)?

I'm trying to track down the original reference for the logarithmic loss (logarithmic scoring rule, cross-entropy), usually defined as: $$L_{log}=y_{true} \log(p) + (1-y_{true}) \log(1-p)$$ For the Brier score for example there is the Brier (1950)…
Gabriel
  • 3,072
  • 1
  • 22
  • 49
11
votes
4 answers

Why is binary cross entropy (or log loss) used in autoencoders for non-binary data

I am working on an autoencoder for non-binary data ranging in [0,1] and while I was exploring existing solutions I noticed that many people (e.g., the keras tutorial on autoencoders, this guy) use binary cross-entropy as the loss function in this…
6
votes
2 answers

Calculate binomial deviance (binomial log-likelihood) in the test dataset

I'm predicting probabilities $\mathbb{P}(Y=1)$ using a probability forest (ranger in R). I want to evaluate my predictions $\hat p_i$ in a test dataset by calculating average binomial deviance (log-likelihood). I believe the formula…
5
votes
2 answers

"Dumb" log-loss for a binary classifier

I am trying to understand how I can best compare a classifier that I have trained and tuned against a "dumb" classifier, particularly in the context of binary classification with imbalanced classes. Here's a summary of my experiment: suppose I have…
5
votes
1 answer

Pytorch Cross Entropy Loss implementation counterintuitive

there is something I don't understand in the PyTorch implementation of Cross Entropy Loss. As far as I understand, theoretical Cross Entropy Loss is taking log-softmax probabilities and output a real that should be closer to zero as the output is…
5
votes
1 answer

Log Loss function in scikit-learn returns different values

I have been trying to wrap my head around the log loss function for model evaluation. I understand how the value is calculated after doing the math by hand. In the python module sklearn.metrics the log_loss function returns two different values…
5
votes
1 answer

Smoothing/shrinking the predicted probability of a classifier to reduce live logloss

Let us assume we work on a 2 -class classification problem. In my setting the sample is balanced. To be precise it is a financial markets setting where up and down have approximately 50:50 chance. The classifier produces results $$p_i = P[class =…
Richi W
  • 3,216
  • 3
  • 30
  • 53
4
votes
2 answers

Logarithmic loss vs Brier score vs AUC score

I have a dataset with two classes of elements. I also have two methods which assign (complementary) probabilities to each element in the dataset of belonging to either class. Given that I work with probabilities (instead of hard 0,1 classification…
Gabriel
  • 3,072
  • 1
  • 22
  • 49
4
votes
1 answer

Is there a cross-entropy-like loss function for multiple classes where misclassification costs are not identical?

For this conversation I'll use the below definition of cross-entropy where there are N samples, M different classes, $ y_{ij} $ is 1 if sample i is of class j and 0 otherwise and $p_{ij}$ is the probability that a sample i is of class j as assigned…
4
votes
2 answers

How does the L2 regularization penalize the high-value weights

I am reading about regularization in machine learning model. I want to understand how mathematically the L2 term penalizes the high-value weights to avoid overfitting? Any explanation?
3
votes
0 answers

derivative of cross entropy yields log-odds, does that make sense?

I am looking for a proof how to derive the logistic regression from cross-entropy loss, i.e. derive the form of a sigmoid from cross entropy. my thoughts are these: $\ell = y_i \ln{p_i} + (1-y_i)\ln{(1-p_i)}$ $\frac{\partial \ell}{\partial y_i} =…
pikachu
  • 731
  • 2
  • 10
2
votes
2 answers

Understanding cross entropy loss

The formula for cross entropy loss is this: $$-\sum_iy_i \ln\left(\hat{y}_i\right).$$ My question is, what is the minimum and maximum value for cross entropy loss, given that there is a negative sign in front of the sum? For example: let's say…
1
2 3 4