Derivations of Bayesian Risk Classifier when Posteriori Probability is Unknown

Question

I found two expressions for a Bayesian risk classifier when the posteriori probability is unknown, but I don't understand how and why the derivations were made.

For this scenario, assume:

$X\in\mathbb{X}=[0,1],Y\in\{ 0,1 \}$

$\pi_y=P(Y=y)=1/2$ for $y\in{0,1}$

Conditional distributions $[X|Y=y]$ characterised by:

$f(x|Y=0)=2-2x$ and $f(x|Y=1)=2x$.

Let $\tau_1$ be the posteriori probability and $L(r*)$ be the risk classifier.

In the first case, assume $\tau_1\in[0,1]$ is unknown, thus the following expression can be written: $L(r^*)=\int_Xmin\{(1-\tau_1)f(x|Y=0),\tau_1f(x|Y=1\}dx$

I'd like to understand this expression and get an intuitive understanding why it is true?

Additionally, if $\tau_0=\tau_1=1/2$, the following expression can be derived: $L(r*)=1/2-1/4\int_X|f(x|Y=1)-f(x|Y=0)|dx$

How is this connected to the above statement and why is it true?

lease explain **(a)** what does $\tau_1$ stand for? Is it the conditional probability $P(Y=1|X)$? **(b)** is this question related to your previous one? (https://stats.stackexchange.com/questions/549134/explicit-form-and-function-of-posteriori-probability-when-y-1) — Spätzle, Oct 24 '21 at 08:30
For $(a)$, $\tau_1$ is the posteriori probability. I've seen it referred to as $\mu$ in other literature. For $(b)$, yes it is connected to that question. This is from a series of questions on the same conditions and data. — Major Redux, Oct 24 '21 at 08:45
please add the full details of the problems, as other readers do not remember them by heart — Spätzle, Oct 24 '21 at 10:44

score 0 · Accepted Answer · answered Oct 27 '21 at 08:00

We have previously seen that $L(r^*)$ is $$L(r^*)=E\left[\min\{\tau_1(X),1-\tau_1(X)\}\right]=\frac{1}{2}-\frac{1}{2}E\left[\left|2\tau_1(X)-1\right|\right]$$

The loss function is the probability of producing a wrong prediction, given $X=x$:

$$L(r^*)=P(r^*(x)\ne Y|X)=P(\tau_1<1-\tau_1,Y=1|X)+P(\tau_1>1-\tau_1,Y=0|X)$$ We use the property of indicator function for event $A$, which is $P(A)=E[I\{A\}]$: $$=E[I\{\tau_1<1-\tau_1,Y=1|X\}]+E[I\{\tau_1>1-\tau_1,Y=0|X\}]$$ Next thing we need to know is that and indicator of two events can be decomposed to a multiplication of two indicators (like an AND function):

$$=E[I\{\tau_1<1-\tau_1|X\}]\cdot E[I\{Y=1|X\}]+E[I\{\tau_1>1-\tau_1|X\}]\cdot E[I\{Y=0|X\}]$$

Again, indicator property:

$$=E[I\{\tau_1<1-\tau_1|X\}]\cdot P(Y=1|X)+E[I\{\tau_1>1-\tau_1|X\}]\cdot P(Y=0|X)$$

and using Bayes' law:

$$L(r^*)=E[I\{\tau_1<1-\tau_1|X\}]\cdot 0.5P(X|Y=1)+E[I\{\tau_1>1-\tau_1|X\}]\cdot 0.5P(X|Y=0)\\ 0.5E[I\{\tau_1<1-\tau_1|X\}]\cdot f(x|Y=1)+0.5E[I\{\tau_1>1-\tau_1|X\}]\cdot f(x|Y=0)$$

Now, let's decompose $E[I\{\tau_1<1-\tau_1|X\}]$: $\tau_1$ is already a function of $x$, so the conditional notation is redundant, meaning $E[I\{\tau_1<1-\tau_1|X\}]=E[I\{\tau_1<1-\tau_1\}]$. We apply the law of total expectation: $E[I\{\tau_1<1-\tau_1\}]=E[E[I\{\tau_1<1-\tau_1|Y\}]]$. Thing is, $\tau_1$ has nothing to do with $Y$ so:

$$E[E[I\{\tau_1<1-\tau_1|Y\}]]=E[I\{\tau_1<1-\tau_1|Y=0\}+I\{\tau_1<1-\tau_1|Y=1\}]=E[I\{\tau_1<1-\tau_1\}+I\{\tau_1<1-\tau_1\}]=2E[I\{\tau_1<1-\tau_1\}]$$

and a similar result is obtained for $E[I\{\tau_1>1-\tau_1|X\}]$. Plugging these back in, we get (using indicator property):

$$L(r^*)=P(\tau_1<1-\tau_1)f(x|Y=1)+P(\tau_1>1-\tau_1)f(x|Y=0)$$

which can be written as (similar to our previous solution):

$$L(r^*)=E[\min\{\tau_1 f(x|Y=1),(1-\tau_1)f(x|Y=0)\}]$$

and then we simply write the expectation as an integral:

$$L(r^*)=\int_{X}{{\min\{(1-\tau_1)f(x|Y=0),\tau_1f(x|Y=1)\}}dx}\qquad\blacksquare$$

Second part:

$$E[\min\{\tau_1 f(x|Y=1),(1-\tau_1)f(x|Y=0)\}]=\frac{1}{2}E[\min\{f(x|Y=1),f(x|Y=0)\}]$$

Now, the $\min$ function can be written as (check it!):

$$\min\{a,b\}=0.5(a+b-|a-b|)$$

So

$$\frac{1}{2}E[\min\{f(x|Y=1),f(x|Y=0)\}]\\=\frac{1}{4}E[f(x|Y=1)+f(x|Y=0)-|f(x|Y=1)-f(x|Y=0)|]\\=\frac{1}{4}E[2-|f(x|Y=1)-f(x|Y=0)|]\\=\frac{1}{2}-\frac{1}{4}E[|f(x|Y=1)-f(x|Y=0)|]\\=\frac{1}{2}-\frac{1}{4}\int_{X}{|f(x|Y=1)-f(x|Y=0)|dx} $$

Derivations of Bayesian Risk Classifier when Posteriori Probability is Unknown

1 Answers1

Linked