Logistic regression: Why we don't plot the residuals against the fitted values?

Question

I'm hoping someone can explain this bit of R code for me related to glm(). I don't understand the diagnostic plot that has been suggested. It seems a more informative plot would be to plot against the fitted values, but maybe I don't understand something. Here's the code:

result <- glm(survive~age, data=donner, family=binomial)
# Why is this plotted against the respondent index?
plot(residuals(result,type="pearson"), main="pearson residual plot")

Data to reproduce the above example:

> dput(donner)
structure(list(age = c(23L, 40L, 40L, 30L, 28L, 40L, 45L, 62L, 
65L, 45L, 25L, 28L, 28L, 23L, 22L, 23L, 28L, 15L, 47L, 57L, 20L, 
18L, 25L, 60L, 25L, 20L, 32L, 32L, 24L, 30L, 15L, 50L, 21L, 25L, 
46L, 32L, 30L, 25L, 25L, 25L, 30L, 35L, 23L, 24L, 25L), sex = c(1L, 
0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 
0L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 
1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L), survive = c(0L, 
1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 
1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 
0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L)), .Names = c("age", 
"sex", "survive"), class = "data.frame", row.names = c(NA, -45L
))

Well, why don't you try plotting the residuals against the fitted values to see what happens? — onestop, Feb 25 '12 at 21:07

score 8 · Accepted Answer · edited Feb 25 '12 at 21:35

With this diagnostic plot, we're just looking at the residuals to see if anything leaps out at us - a clump of outliers, or, as happens with this data, a clear separation of the residuals into groups. It's merely one of several diagnostic plots you can, and should, do. We might suspect the two groups correspond to sex, then plot the residuals vs sex:

plot(residuals(result,type="pearson") ~ donner$sex,
     main="pearson residual vs sex plot")

which would tell us everything we wanted to know; plotting residuals vs. the explanatory variables (well, in this case a left-out variable) can inform us about possible nonlinearities and other problems with the model.

I suspect you were given this as part of an exercise in using diagnostic plots as tools to help indicate potential model improvements, rather than as a sort of sine qua non of diagnostic plots - although it's definitely a useful plot in its own right.

Logistic regression: Why we don't plot the residuals against the fitted values?

1 Answers1

Linked