2

The problem which I have at hand consists of a logistic regression model for risk evaluation that has been made on some credit card data of Quarter-1'12 (Jan'12 - Mar'12). Now I use the same model to evaluate risk for the data of Quarter-2'12(Apr'12-Jun'12). I want to devise an accuracy score that gives an insight into how "accurately" my model fits the new data (Quarter -2 data).

I have used Hosmer Lemeshow Statistic and Balanced Accuracy Method till now but none served the purpose. What further can be done in this regard?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Kasha2592
  • 21
  • 1
  • 2
  • 1
    The mean square error in the *predict probability for binary outcome* case is the [Brier score](http://en.wikipedia.org/wiki/Brier_score), which is a [proper scoring rule](http://en.wikipedia.org/wiki/Scoring_rule#Proper_scoring_rules). Optimizing proper scoring rules corresponds to finding predicted probabilities that are well calibrated to the actual probabilities in the data. Is this the kind of "accuracy" you are looking for? –  Jul 01 '14 at 14:10
  • @Matthew I am looking for a measure that can ensure logistic regression model (made on an older data) fitting a new data-set. I want it to alert me when the model is not predicting accurately so that I can re-model my regression model to new data-set. What I meant by accuracy is that a new score should be able to indicate the error between observed (values of new data-set) and predicted values (by the model trained on older data-set) – Kasha2592 Jul 02 '14 at 07:52
  • 1
    Lots of options since what you're looking for isn't very specific. There is the brier score I mentioned, or any other proper scoring rule. Also look at [this thread](http://stats.stackexchange.com/questions/18178/measuring-accuracy-of-a-logistic-regression-based-model?rq=1). –  Jul 02 '14 at 13:52

1 Answers1

2

If I understand correctly, the situation is that you have an estimated logistic regression model that you use in production. So you need a system of quality control that can tell you if the model start to perform worse.

One possibility could be to calculate a running Brier score, you could on the same plot indicate the expected Brier score assuming the model is correct and calibrated. That should give a reference value. There might be some ideas in this stored search.

The Brier score is a proper scoring rule, you could also choose to use some other proper scoring rule.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467