I've built a binary logistic model using R's glm()
method. I already know the data aren't extremely predictive but I think my model can do a couple points better than random. One way I thought of testing the predictive power of the model's returned probabilities was to build a second, "meta" model using the holdout set, using the first model's probabilities as the independent variable. It looks like this:
test_x$Probs <- predict(model, newdata = test_x,type="response")
meta_model <- lm(test_y ~ Probs, data = test_x)
> meta_model
Call:
lm(formula = test_y ~ Prob, data = test_x)
Coefficients:
(Intercept) Prob
0.0686 0.6966
Is this a sensible thing to do? This seems like a good result but I can't tell if this is an independent test. I know it's not eh most rigorous validation, but is it a valid measure?