Normalise different thresholds for binary prediction

Question

I'm working in a module that outputs the risk of an event happening i.e. risk of a crime happening depending on the district of the city. What I've done is to calculate for each district a binary prediction model that will output the probability of the event happening or not. For each model, we will have an optimal threshold that determines the positive or negative outcome of the event whenever the probability > threshold. I want to have a ranking of the most dangerous districts. However, I cannot rank by probability, since a model could score a lower probability than some other model and yet be nearer to its threshold.

e.g. model 1 has a threshold of 0.95. model 2 has a threshold of 0.50. Model 1 outputs 0.70 probability of the event happening. Model 2 outputs 0.45. Even when the model 2 probability is lower, it has a greater risk since the probability is closer to its 0.50 threshold.

Will this be as easy as calculating the percentage of how far it is from the threshold. In this example (threshold - prob)/threshold?

Is there a way of "normalizing" (not sure if this is the right word) these probability values to something between 0 and 1 taking into account that 1 means that we have surpassed the threshold and that a crime is going to happen?

You might be interested in proper scoring rules. – Dave May 07 '20 at 22:53 — Dave, May 07 '20 at 22:53

score 2 · Answer 1 · answered May 08 '20 at 06:52

2

Don't use thresholds in assessing models. (See here for some of the problems with them.) Instead, find out which of your models yield well-calibrated and sharp probabilistic predictions, using proper scoring rules. We have a scoring-rules tag. (Consider combining models, this often improves predictions.)

Once you have a well-performing model, consider using thresholds that are informed by your costs of wrong decisions, and that might even include more possible decisions than there are classes.

answered May 08 '20 at 06:52

Stephan Kolassa

95,027
13
197
357

Thanks for the answer. I went through the link you sent. However, and correct me if I'm wrong, I did take these things into account. 1st the threshold is chosen optimally through gmeans (basically choosing the threshold that maximises true positives and minimises false positives). 2nd note that what I want to have as an output is a risk assessment (likelihood of a crime), which is aligned with what the article in the link explains. I just need the threshold to asses if the output probability is high, medium or low risk. I will also edit the question to clarify all of this. – Brandon May 08 '20 at 07:47
After reading a bit on scoring functions I begin to understand the problem here. I found out about Brier scoring. Does this mean I should choose a threshold that minimises Brier Scoring? Again, I just need the threshold because the ouput is going to be a risk map (red=high risk, yellow=medium risk, green=low risk) and I need to asses how far the probability output is from the threshold. – Brandon May 08 '20 at 09:01
No, Brier scores (nor any other scoring rule) do *not* rely on thresholds. They assess the *entire* probabilistic output and reward correctly calibrated probabilistic predictions. If ten instances have the exact same predictor values (so they get the same prediction $\hat{p}$), and seven of them turn out to be the target class, then your scoring rule will be best if $\hat{p}=0.7$. No threshold involved. ... – Stephan Kolassa May 08 '20 at 11:57
... The idea is to use scoring rules to find a correct probabilistic prediction model. Only once you have a well performing model does it make sense to use thresholds to make *decisions* based on the output. I advocate a separation of concerns: (probabilistic) prediction is a different thing than decision making (e.g., prediction should be blind to costs, and decisions certainly not), and if they are separated, you can pinpoint problems much better than if everything is conflated by baking thresholds into your statistical model. – Stephan Kolassa May 08 '20 at 11:59
Yes. I have looked more into it and this is exactly what I understood. Thanks, this makes much more sense now for my use case. So, as a conclusion and following your separation of concerns: 1st I will create a probabilistic model that results in the best score. 2nd I make decisions based on the probabilistic outputs. For example, in my use case, it would be to send police patrols to the districts with the highest risk/probabilistic scores – Brandon May 08 '20 at 22:16
That is one possibility. Just be aware that this might (a) run into a [disparate impact type of problem](https://en.wikipedia.org/wiki/Disparate_impact) (although the term is specific to labor law, the problem here is similar), since there will probably be socioeconomic differences between the areas with high vs. low probabilities, and (b) may yield selection biases - if more police are in an area, they will tend to notice more crime, reinforcing your classification. Make sure you also collect data from ostensibly low-crime areas. – Stephan Kolassa May 11 '20 at 04:22

Normalise different thresholds for binary prediction

1 Answers1

Linked