0

I was doing some reading on choosing the score cut-off of a logistic regression model using the KS-Stats. Suppose I fitted a logistic regression model on the train data and now want to decide the probability cut-off to get the confusion matrix. After ranking them into deciles based on the estimated probabilities and calculated the KS value for each group. Let's assume the supremum KS lies in the 4th group. In practice, practitioners take the minimum probability of that group as the score cut-off and build the confusion matrix. Now my question is why to choose the minimum probability of that decile/group? The actual supremum(if we increase the number of groups we will be closer to the actual supremum) may lie anywhere in the 4th decile or 5th decile. Intuitively I can understand the logic, still, if someone can explain the logic in a more statistical way, it would be great.

Arun
  • 1
  • 2
  • I suggest you take a look at this: https://stats.stackexchange.com/a/312787/1352 (note the links to blog posts by Frank Harrell). – Stephan Kolassa Dec 07 '17 at 15:53

1 Answers1

3

There are many fundamental misunderstandings of statistical modeling encapsulated in your question. These misunderstandings have been written about at length on this site. Among them, the logistic model is a probability estimator and uses no cutoffs on the resulting estimates. KS is not relevant here, and quantile groups are not relevant either. The logistic model is distinct from the logistic distribution for continuous responses.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322