I recently interviewed for a machine learning job which involved very mathematically rigorous questions. This is one of them, which I'm still very confused about.
Question: Given a data generating distribution $\mathbb{P}(x,y)$ and a classifier $f(x)$, where $x$ is the observation and $y$ is the label, prove that the inaccuracy given by:
$$\mathbb{E}[\mathbb{1}(f(x) \neq y)]$$
is minimized by the classifier: $$f^*(x) = \text{sign}\big(\mathbb{P}(y|x) - \frac{1}{2}\big)$$
where $\mathbb{1}(a \neq b)$ is $1$ when $a \neq b$ and $0$ otherwise, and $\text{sign}(a)$ gives the arithmetic sign of $a$, that is $+1$ or $-1$.
Now I have multiple concerns about this question which really confuse me:
- What is the meaning of this statement? What does it say on an intuitive level?
- What information does it give us about the classifier? Can this information be used to design a "practically good" classifier for the problem which can be learned from data?
- What is the use of this result? How does it relate to, say, designing a logistic regression classifier? I ask about logistic regression because there also we threshold our predicted probability at $0.5$.