In courts, should we assume that $\Pr(Y=y)$ is constant, for all $y \in \mathcal{Y}$?

Question

In the case of supervised classification, we wish to predict the label of unseen observation $x\in\mathcal{X}$ by assigning it to some label $y \in \mathcal{Y}$. Specifically, we want to find label $y^*$ as follows: $$\begin{split} y^* &= \underset{y\in\mathcal{Y}}{\text{arg max}\;} \Pr(Y=y|X=x)\\ &= \underset{y\in\mathcal{Y}}{\text{arg max}\;} \Pr(Y=y)\Pr(X=x|Y=y) \quad\text{(by Bayes theorem)}\\ \end{split}$$

Of course, if the distribution of labels is uniform, then $\Pr(Y=y) = 1/|\mathcal{Y}|$ is constant for all $y \in \mathcal{Y}$. In such case, we can simplify the above, by dropping $\Pr(Y=y)$, into: $$\begin{split} y^* &= \underset{y\in\mathcal{Y}}{\text{arg max}\;} \Pr(X=x|Y=y)\\ \end{split}$$

And that's essentialy what any supervised classification learning algorithm aims to find. E.g. SVM, NB, etc, essentially find classification models that necessarily imply some definition of those probabilities.

Now, my question is: suppose that a suspect $x$ is to be classified whether he/she is guilty, or not guilty. Suppose that $\Pr(Y=\text{guilty}) =0.6$. Should we use this knowledge when judging on suspects? Or, alternatively, should we ignore such probability and assume that $\Pr(Y=\text{guilty}) = \Pr(Y=\text{not guilty}) = 0.5$?

My attempt:

I would imagine that dropping $\Pr(Y=y)$ is recommended in legal systems, such as courts. For example, if "theft" is a highly common crime, e.g. $\Pr(Y=\text{theft}) = 0.8$, then we must not tend to rule that suspect $x$ is a theif simply cause others tend to be thieves. In other words, we should assume that $\Pr(Y=y)=1/|\mathcal{Y}|$ for any crime $y$. Instead, all judgements against suspect $x$ should be solely based on maximizing $\Pr(X=x|Y=y)$, under the assumption that $\Pr(Y=y)$ is constant.

Any thoughts?

What about $Pr(Y = \text{not guilty}) = 0.99$ : - ) – Łukasz Grad May 18 '17 at 07:16 — Łukasz Grad, May 18 '17 at 07:16

Tim · Accepted Answer · 2017-05-18T07:53:05.843

Certainly not! You could start with Interpretation of Bayes Theorem applied to positive mammography results thread, for similar, practical application of Bayes theorem to real life problems. In fact what you are describing is a base rate fallacy, you are ignoring the fact how common the crime is! Obviously, half of the population is not guilty, so you cannot assume so. You should rather assume that minority of population are criminals, so you should strong prior for innocence, i.e. assume that someone is not guilty until proven, without doubt, otherwise. Hopefully this is how legal system works in most modern-day societies.

This is actually an example where we have strong a priori knowledge that should be considered. If by time travel, or by traveling to some other, parallel universe, you would be relocated to some kind of Mad Max reality, then maybe you should revise your prior and until learning more about how the society in the other universe functions, you possibly could assume a priori that you really do not know if you should trust, or not, it's inhabitants and assume a priori that they may be dangerous criminals with $50/50$ chance.

Notice however that one more thing is involved in here, the loss function. If you assume a priori that others are criminals, then you would act in distrustful way when interacting with them. This would possibly lead to many situations where you would wrongly assume someones wrongdoing, and this could lead to some serious negative social consequences for you. Here the question arises: would you loose more by wrongly assuming someone's guilt, or by becoming victim of the criminal? (Yes, this depends on many factors, but they should be considered.) The same is in the court's situation: if they falsely assumed a priori that half of the population are criminals, then they would give biased verdicts, what would make society not trust courts and the legal system would collapse.

Thank you very much. Just to make sure if I'm on the right track, here is my thought about the best strategy, and would be thankful if you could share your thoughts about it: **My understanding:** I guess the best way is: start by measuring $\Pr(Y=y)$ empirically, then apply a correction to have a bias in such a way that the optimization objective is maximized. The optimization objective is probably some realistic quantification of the risk of misclassification. — caveman, May 19 '17 at 06:15
@caveman basically yes, but this is strictly theoretical and nonapplicable in real-life court situation. — Tim, May 19 '17 at 06:28

In courts, should we assume that $\Pr(Y=y)$ is constant, for all $y \in \mathcal{Y}$?

My attempt:

1 Answers1