2

I'm building a logit model using R and I'm getting a result of 88.9% of accuracy (verified using the ROC [in rattle, evaluation tab] using 30% of my 34k dataset).

What kind of tests would be interesting to do to certify myself that it's a good model?

mpiktas
  • 33,140
  • 5
  • 82
  • 138
carlosedubarreto
  • 547
  • 2
  • 5
  • 10

1 Answers1

2

You should plot your residuals vs. explanatory variables (i.e. $X_i$'s) and residuals vs. fitted values to see if there is anything wrong with the model. There are other diagnostic plots. In R you can use function glm.diag.plots in package boot. See the code below and also this post. Here I will also use package Mass to load a data.

library(MASS)
data(menarche) 
plot(Menarche/Total ~ Age, data=menarche)
glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,family=binomial(logit), data=menarche)
library(boot)
glm.diag.plots(glm.out) 

enter image description here

And to have some more fun with the fitted values:

plot(Menarche/Total ~ Age, data=menarche)
lines(menarche$Age, glm.out$fitted, type="l", col="red")

enter image description here

Stat
  • 7,078
  • 1
  • 24
  • 49
  • Thanks for the answer, but, another question rises, should the residuals be closer or far to each other? Should I try some manipulation in my variable to fix something I saw, like combining variables? – carlosedubarreto Oct 20 '13 at 05:33
  • Have a look at [this](http://stats.stackexchange.com/questions/29271/interpreting-residual-diagnostic-plots-for-glm-models) and the links in the answer. It is much more comprehensive than what I was going to say here. Depending on the problem, you need to revised your model, sometimes removing a variable, sometimes transforming it. I cannot provide a single remedy, it is depends to the problem. – Stat Oct 21 '13 at 03:45