6

I'm predicting probabilities $\mathbb{P}(Y=1)$ using a probability forest (ranger in R). I want to evaluate my predictions $\hat p_i$ in a test dataset by calculating average binomial deviance (log-likelihood). I believe the formula is: \begin{equation} \text{mean deviance} = \sum_{i\in \text{testset}} -2\big[Y_i\ln \hat p_i + (1-Y_i)\ln(1- \hat p_i)\big] \end{equation} How do I deal with the fact that I have forest predictions that are excactly 0 or 1? For these observations, deviance is not defined due to the logarithm. Should I just omit them? Or should I set these values to, say, 0.00001 and 0.99999 respectively?

Ben
  • 91,027
  • 3
  • 150
  • 376
user116514
  • 81
  • 1
  • 4
  • My aim is to estimate probabilities, not classify observations. Specificity and sensitivity are used in the latter case. – user116514 Oct 14 '18 at 08:02
  • P(Y=1|X=x). In my case, Y is a binary indicator that patients are part of a pharmaceutical cost group (1=yes, 0=no). X is a vector of predictors. – user116514 Oct 14 '18 at 18:22
  • you misunderstand what i'm doing. My estimator is a random probability forest. My evaluation metric in the testset is the binomial deviance. – user116514 Oct 16 '18 at 16:15

2 Answers2

1

I recommend against fudging these prediction values. The appropriate outcome here is that if the model predicts a thing with probability 1, and that thing doesn't happen, then its deviance is infinite. Similarly, if the model predicts a thing with probability 0, and that thing happens, then its deviance is infinite. That is the price you pay for making such strong predictions on an outcome and getting them wrong.

To achieve this as the outcome, you just have to deal with the ambiguity in terms of the form $0 \times -\infty$. Here you would adopt the convention that $\ln 0 = \infty$ and $0 \times -\infty = 0$, giving you:

$$\begin{align} Y_i \ln \hat{p}_i &= \begin{cases} 0 & & & & & \ \text{if } Y_i = 0, \\[6pt] \ln \hat{p}_i & & & & & \ \text{if } Y_i = 1. \\[6pt] \end{cases} \\[18pt] (1-Y_i) \ln (1-\hat{p}_i) &= \begin{cases} \ln (1-\hat{p}_i) & & & \text{if } Y_i = 0, \\[6pt] 0 & & & \text{if } Y_i = 1. \\[6pt] \end{cases} \end{align}$$

Ben
  • 91,027
  • 3
  • 150
  • 376
0

You can clip the probabilities to guarantee that will they will never be 0 or 1. For example as per sklearn docs, set up a small value named eps and use max(eps, min(1 - eps, p) where p is the classifier's probability. sklearn docs for logloss

mchl_k
  • 1
  • 1