19

I know that standardized Pearson Residuals are obtained in a traditional probabilistic way:

$$ r_i = \frac{y_i-\hat{\pi}_i}{\sqrt{\hat{\pi}_i(1-\hat{\pi}_i)}}$$

and Deviance Residuals are obtained through a more statistical way (the contribution of each point to the likelihood):

$$ d_i = s_i \sqrt{-2[y_i \log \hat{\pi_i} + (1 - y_i)\log(1-\hat{\pi}_i)]} $$

where $s_i$ = 1 if $y_i$ = 1 and $s_i$ = -1 if $y_i$ = 0.

Can you explain to me, intuitively, how to interpret the formula for deviance residuals?

Moreover, if I want to choose one, which one is more suitable and why?

BTW, some references claim that we derive the deviance residuals based on the term

$$-\frac{1}{2}{r_i}^2$$

where $r_i$ is mentioned above.

Marjolein Fokkema
  • 1,363
  • 6
  • 22
Jack Shi
  • 521
  • 1
  • 3
  • 14

2 Answers2

13

Logistic regression seeks to maximize the log likelihood function

$LL = \sum^k \ln(P_i) + \sum^r \ln(1-P_i)$

where $P_i$ is the predicted probability that case i is $\hat Y=1$; $k$ is the number of cases observed as $Y=1$ and $r$ is the number of (the rest) cases observed as $Y=0$.

That expression is equal to

$LL = ({\sum^k d_i^2} + {\sum^r d_i^2})/-2$

because a case's deviance residual is defined as:

$d_i = \begin{cases} \sqrt{-2\ln(P_i)} &\text{if } Y_i=1\\ -\sqrt{-2\ln(1-P_i)} &\text{if } Y_i=0\\ \end{cases}$

Thus, binary logistic regression seeks directly to minimize the sum of squared deviance residuals. It is the deviance residuals which are implied in the ML algorithm of the regression.

The Chi-sq statistic of the model fit is $2(LL_\text{full model} - LL_\text{reduced model})$, where full model contains predictors and reduced model does not.

ttnphns
  • 51,648
  • 40
  • 253
  • 462
3

In response to this question I have added som R code to show how to manually apply the formula for calculation of deviance residuals

The model in the code is a logit model where

$$p_i := Pr(Y_i = 1) = \frac{\exp(b_0 + b_1x_i)}{1+\exp(b_0 + b_1x_i)}.$$

I define $v_i := b_0 + b_1x_i$ such that the model can be written as

$$p_i := Pr(Y_i = 1) = \frac{\exp(v_i)}{1+\exp(v_i)}.$$

Estimating the model I get estimates $\hat b_0$ and $\hat b_1$. Using these estimates the predicted latent values

$$\hat v_i := \hat b_0 + \hat b_1 x_i,$$

are calculated and then the predicted probabilities are calculated

$$\hat p_i =\frac{\exp(\hat v_i )}{1+\exp(\hat v_i )}.$$

Using these predicted probabilities the formula for the deviance residuals are applied in the coding step

sign(y-pred_p) * ifelse(y==1,sqrt(-2*log(pred_p)),sqrt(-2*log(1-pred_p)))

which is simply an application of the formula

$d_i = \begin{cases} \sqrt{-2\ln(\hat p_i)} &\text{if } Y_i=1\\ -\sqrt{-2\ln(1-\hat p_i)} &\text{if } Y_i=0\\ \end{cases}$

# Simulate some data
N <- 1000
b0 <- 0.5
b1 <- 1
x <- rnorm(N)
v <- b0 + b1*x
p <- exp(v)/(1+exp(v))
y <- as.numeric(runif(N)<p)

# Estimate model
model <- glm(y~x,family=binomial)
summary_model <- summary(model)
summary_dev_res <- summary_model$deviance.resid
# This is the output you get:
quantile(summary_dev_res)


# Calculate manually deviance residuals
# First calculate predicted v's
pred_v <- coef(model)[1] + coef(model)[2]*x
# The calculate predicted probabilities
pred_p <- exp(pred_v)/(1+exp(pred_v))
# Apply formula for deviance residuals
dev_res <- sign(y-pred_p) * ifelse(y==1,sqrt(-2*log(pred_p)),sqrt(-2*log(1-pred_p)))
# Check that it is the same as deviance residuals returned from summary
plot(summary_dev_res,dev_res)
points(seq(-3,3,length.out=100),seq(-3,3,length.out=100),type="l",col="red",lwd=2)
# all points should be on the red line 


# Also compare the quantiles ... 
quantile(summary_dev_res)
quantile(dev_res)
Jesper for President
  • 5,049
  • 1
  • 18
  • 41