I am looking into identifying extreme values from their contribution to a binary outcome model. I have an unbalanced set and some extreme values which are part of the smaller set to predict (i.e predit a 1) so I can't really remove them. Ideally you can use the cook's distance, residual and influence from the leverage plot of:
For a linear model
library(car)
plot(lm(mpg ~ wt, mtcars))
I would look at:
But does it make sense for a logistic regression of the form:
plot(glm(outcome ~ variable, family="binomial"))
I guess not because you can't get the residuals...
You seem to be able to do a chi square influence plot or also proportional influence plot like in here on Figure 3 (looking like crosses) and shown below.
I also find a similar diagnostic analysis in SAS. Anyone knows how to do that in R or if there are other ways that would allow me to do the equivalent analysis from linear regression but for logistic regression ? Would calculating the leverage and Cook's distance make sense?
Thanks
[add-on]
- There is a great answer from gung confirming that the lm analysis does not really apply for glm
- It turns out to be a research topic including also the case of multiple influence.. ResearchGate >_<'