0

I've built a binary logistic model using R's glm() method. I already know the data aren't extremely predictive but I think my model can do a couple points better than random. One way I thought of testing the predictive power of the model's returned probabilities was to build a second, "meta" model using the holdout set, using the first model's probabilities as the independent variable. It looks like this:

test_x$Probs <- predict(model, newdata = test_x,type="response")
meta_model <- lm(test_y ~ Probs, data = test_x)

> meta_model

Call:
lm(formula = test_y ~ Prob, data = test_x)

Coefficients:
(Intercept)         Prob  
     0.0686       0.6966  

Is this a sensible thing to do? This seems like a good result but I can't tell if this is an independent test. I know it's not eh most rigorous validation, but is it a valid measure?

2 Answers2

1

GLM are not so easy to validate as there is no straight-forward way of knowing what proportion of your data's variability is being explained.

You could use the Likelihood-ratio test which to see whether your model identifies with the null model (basically that's like knowing whether or not the adj. R-sq. is “too low”), and then you also have the Deviance goodness of fit test which lets you see whether your model identifies with the saturated model (the equivalent of knowing whether or not the adj. R-sq. value would be “high enough”).

PS: Check out this post too.

Digio
  • 2,427
  • 12
  • 18
0

Look up "forecast encompassing test" in interweb. It relates to comparing two model's forecasts. Basically you try to see whether one model has information in the residuals of another, i.e. encompassing it.

Aksakal
  • 55,939
  • 5
  • 90
  • 176