1

I want to derive an expression for the Bayesian classification risk $L(r^*)$ when the priori $\tau_1\in[0,1]$ is unknown.

For this problem, let:

$X\in\mathbb{X}=[0,1],Y\in\{ 0,1 \}$

$\pi_y=P(Y=y)=1/2$ for $y\in{0,1}$

Also, conditional distributions are characterised by:

$[X|Y=y]$ are characterised by $f(x|Y=0)=2-2x$ and $f(x|Y=1)=2x$.

In a binary classification case, $L(r^*)=E(min\{\tau_1(X),1-\tau_1(X)\}=1/2-1/2E(|2\tau_1(X)-1|)$, so I'm assuming I need to replace the $\tau_1$ somehow, but I'm not exactly sure what else the classification risk would rely on.

Spätzle
  • 2,331
  • 1
  • 10
  • 25
  • 1
    You are provided with $P(Y=y)$ and $P(X|Y)$, $Y$ is binary so $P(X)$ is easily found. Please explain **(a)** what does $\tau_1$ stand for? **(b)** how does this question differ from your previous one? (https://stats.stackexchange.com/questions/549134/explicit-form-and-function-of-posteriori-probability-when-y-1) and **(c)** which textbook are you taking these question from? – Spätzle Oct 24 '21 at 08:28
  • $\tau_1(x)$ is $P(Y=1|X=x)$ I believe. The question from $(b)$ was looking for the explicit form of $\tau_1(x)$ and its graph. This question is looking for the derivation of $L(r^*)$ or the classification risk. In regards to $(c)$, these are challenge questions from my university course. The main problem I'm having is the lecturer uses notation that is varies from what I've seen in textbooks and the internet, and secondly, I don't understand this material very well. I'm from a machine learning background and I'm doing a stats elective. Second half of course has just thrown me in the deep end. – Major Redux Oct 24 '21 at 08:42
  • $\tau_1$ is the posteriori probability. I've seen it referred to as $\mu$ in other literature. – Major Redux Oct 24 '21 at 08:47
  • No offence intended - Do you feel comfortable enough with the definitions of prior & posterior, or do you need some clarification? – Spätzle Oct 24 '21 at 09:04
  • No offence taken. Any explanations would be greatly appreciated. I don't understand Bayesian statistics at all. – Major Redux Oct 24 '21 at 09:05

1 Answers1

0

First, let us define what are prior and posterior: Assume that a variable $x$ "belongs" to a distribution $F$ with parameter $y$. For example, $x$ could be the result of a coin toss, with probability $y$ for getting "head". Given a specific value of $y$, say $y=\theta$ we have enough information to construct the PDF, CDF and everything else related to $x$:

$$P(x=1|y=\theta)=\theta,\qquad P(x=0|y=\theta)=1-\theta$$

Note that these probabilities are conditional on the value of $y$.

But what if the probability $y$ is itself a variable? In general, that's the basic idea behind Bayesian methods. Going on with the coin toss example - before flipping the coin, we take an assumption regarding the probability which the parameter $y$ "comes from". That is, we assume its probability of having certain values prior to conducting our experiment. That's why it is called the prior distribution.

Next, we conduct our experiment and collect the results. We would like to obtain an estimate of the distribution of $y$ given the results of $x$ we have obtained. This could be only done after conducting our experiment. That's why it's called posterior. This is the probability of $y=\theta$ given our results of $x$, so it is denoted $P(y=\theta|x)$.

Now, let's look at the Bayes theorem, which connects $P(X|Y)$ with $P(Y|X)$: Bayes explained


Back to your question: we are provided with $P(Y), P(X|Y)$ so we can find the marginal distribution for the denominator and then the posterior (see my previous solution), which is $\tau_1(X)=P(Y=1|X)=X$.

The loss function is $L(r^*)=E\left[\min\{\tau_1(X),1-\tau_1(X)\}\right]$, but we know nothing about how $E[\tau_1(X)]=E[X]$ looks like. Surprisingly enough, it doesn't matter at all.

In the case of $\tau_1(X)<1-\tau_1(X)$, we take $L(r^*)=E\left[\tau_1(X)\right]$, but there's more: $\tau_1(X)<1-\tau_1(X)$ so $2\tau_1(X)<1$, which means $|2\tau_1(X)-1|=1-2\tau_1(X)$, then: $$0.5-0.5E[|2\tau_1(X)-1|]=0.5-0.5E[1-2\tau_1(X)]=0.5-0.5+E[\tau_1(X)]=E[\tau_1(X)]$$

...which is exactly $E\left[\min\{\tau_1(X),1-\tau_1(X)\}\right]$. It's easy to show that the complementary case also holds.

Spätzle
  • 2,331
  • 1
  • 10
  • 25