11

Suppose Two Class $C_1$ and $C_2$ has an attribute $x$ and has distribution $ \cal{N} (0, 0.5)$ and $ \cal{N} (1, 0.5)$. if we have equal prior $P(C_1)=P(C_2)=0.5$ for following cost matrix:

$L= \begin{bmatrix} 0 & 0.5 \\ 1 & 0 \end{bmatrix}$

why, $x_0 < 0.5$ is the threshold for minimum risk (cost) classifier?

This is my note example that I misunderstand, (i.e, how this threshold is reached? )

Edit 1: I think for thresholds of likelihood ratio we can use P(C1) / P(C2).

Edit 2: I add from Duda Book on Pattern some text about threshold. enter image description here

user153695
  • 131
  • 1
  • 8

1 Answers1

4

For a cost matrix $$L= \begin{bmatrix} 0 & 0.5 \\ 1 & 0 \end{bmatrix} \begin{matrix} c_1 \\ c_2 \end{matrix} \;\text{prediction} \\ \hspace{-1.9cm} \begin{matrix} c_1 & c_2 \end{matrix} \\ \hspace{-1.9cm}\text{truth}$$

the loss of predicting class $c_1$ when the truth is class $c_2$ is $L_{12} = 0.5$, and the cost of predicting class $c_2$ when the truth is class $c_1$ is $L_{21} = 1$. There is no cost for correct predictions, $L_{11} = L_{22} = 0$. The conditional risk $R$ for predicting either class $k$ is then

$$ \begin{align} R(c_1|x) &= L_{11} \Pr (c_1|x) + L_{12} \Pr (c_2|x) = L_{12} \Pr (c_2|x) \\ R(c_2|x) &= L_{22} \Pr (c_2|x) + L_{21} \Pr (c_1|x) = L_{21} \Pr (c_1|x) \end{align} $$ For a reference see these notes on page 15.

In order to minimize the risk/loss you predict $c_1$ if the cost from the mistake of doing so (that's the loss of the wrong prediction times the posterior probability that the prediction is wrong $L_{12} \Pr (c_2|x)$) is smaller than the cost of wrongfully predicting the alternative,

$$ \begin{align} L_{12} \Pr (c_2|x) &< L_{21} \Pr (c_1|x) \\ L_{12} \Pr (x|c_2) \Pr (c_2) &< L_{21} \Pr (x|c_1) \Pr (c_1) \\ \frac{L_{12} \Pr (c_2)}{L_{21} \Pr (c_1)} &< \frac{\Pr (x|c_1)}{ \Pr (x|c_2)} \end{align} $$ where the second line uses Bayes' rule $\Pr (c_2|x) \propto \Pr (x|c_2) \Pr (c_2)$. Given equal prior probabilities $\Pr (c_1) = \Pr (c_2) = 0.5$ you get $$\frac{1}{2} < \frac{\Pr (x|c_1)}{ \Pr (x|c_2)}$$

so you choose to classify an observation as $c_1$ is the likelihood ratio exceeds this threshold. Now it is not clear to me whether you wanted to know the "best threshold" in terms of the likelihood ratios or in terms of the attribute $x$. The answer changes according to the cost function. Using the Gaussian in the inequality with $\sigma_1 = \sigma_2 = \sigma$ and $\mu_1 = 0$, $\mu_2 = 1$, $$ \begin{align} \frac{1}{2} &< \frac{\frac{1}{\sqrt{2\pi}\sigma}\exp \left[ -\frac{1}{2\sigma^2}(x-\mu_1)^2 \right]}{\frac{1}{\sqrt{2\pi}\sigma}\exp \left[ -\frac{1}{2\sigma^2}(x-\mu_2)^2 \right]} \\ \log \left(\frac{1}{2}\right) &< \log \left(\frac{1}{\sqrt{2\pi}\sigma}\right) -\frac{1}{2\sigma^2}(x-0)^2 - \left[ \log \left(\frac{1}{\sqrt{2\pi}\sigma}\right) -\frac{1}{2\sigma^2}(x-1)^2 \right] \\ \log \left(\frac{1}{2}\right) &< -\frac{x^2}{2\sigma^2} + \frac{x^2}{2\sigma^2} - \frac{2x}{2\sigma^2} + \frac{1}{2\sigma^2} \\ \frac{x}{\sigma^2} &< \frac{1}{2\sigma^2} - \log \left(\frac{1}{2}\right) \\ x &< \frac{1}{2} - \log \left(\frac{1}{2}\right) \sigma^2 \end{align} $$ so a prediction threshold in terms of $x$ as you search for can only be achieved if the losses from false predictions are the same, i.e. $L_{12} = L_{21}$ because only then can you have $\log \left( \frac{L_{12}}{L_{21}} \right) = \log (1) = 0$ and you get the $x_0 < \frac{1}{2}$.

Andy
  • 18,070
  • 20
  • 77
  • 100
  • Nice Answer, but confused me! if you want to choose $x_0=0.5$ or $x_0<0.5$, which one is correct? – user153695 Apr 01 '15 at 14:47
  • So right on the decision boundary $x_0=0.5$ you can't tell exactly if an observation should be in class one or two (because it's exactly on the boundary). So choosing whether observation $i$ should be in class 1 if $x_0 \leq 0.5$ or $x_0 < 0.5$ is up to you. With large enough samples this should happen for very few observations so at the margin it will matter litter for your result. – Andy Apr 01 '15 at 14:55
  • all of my problem that set bounty to it thtat my prof. calculated $x_0<0.5$ and not accept $x_0=0.5$ please see my edit in question, I thin threshold should be $x_0<0.5$ . – user153695 Apr 01 '15 at 15:05
  • maybe 0.5-ln :) – user153695 Apr 01 '15 at 15:08
  • That was the point though of the picture. You see that below the black line (decision boundary) you can tell exactly that an observation should be in class one. To the right of the black line the observation should be in class two. You can't tell for observations directly on the line, hence they are in class one for sure if $x_0 < 0.5$. – Andy Apr 01 '15 at 15:15
  • you means threshold is exactly $x_0<0.5 $? it's tricky :) – user153695 Apr 01 '15 at 15:33
  • I have looked through this answer several times but could find no reference at all to the cost matrix. How does it affect the solution? – whuber Apr 01 '15 at 16:39
  • 1
    @whuber thanks, I completely missed that so I started from a completely wrong end. – Andy Apr 01 '15 at 20:19
  • best is false. I correct it, please update your answer to set bounty. – user153695 Apr 06 '15 at 16:35
  • please update asap to set bounty – user153695 Apr 07 '15 at 12:44
  • I'm not sure what I'm supposed to update? The threshold for the minimum risk classifier for the given risk matrix is already derived in the answer together with the conditions required to find a threshold of 0.5 – Andy Apr 07 '15 at 12:53