Given Data $D_{in}$, number of data $N=|D_{in}|$, and hypothesis set $H=\{h_1,h_2, ...,h_M\}$.
For a fixed hypothesis $h$, for example $h_1$, we can derive
$$P[|E_{in}(h_1)-E_{out}(h_1)|>\epsilon] \leq 2e^{-2\epsilon^{2}N}$$
from hoeffding's inequality since $\mathbb{E}_{D_{in}}[E_{in}(h_1)]=E_{out}(h_1)$.
If $g$ is our learned hypothesis using the data $D_{in}$, we can't apply hoeffding's inequality directly since $\mathbb{E}_{D_{in}}[E_{in}(g)] \neq E_{out}(g)$.
But my question is, why can't we bound $P[|E_{in}(g)-E_{out}(g)|>\epsilon]$ as following:
$ P[|E_{in}(g)-E_{out}(g)|>\epsilon] \\\ = \sum_{h \in H} P[|E_{in}(g)-E_{out}(g)|>\epsilon \;\land \;g=h]\\\ = \sum_{h \in H} P[|E_{in}(g)-E_{out}(g)|>\epsilon \;\lvert \;g=h]\;P[g=h] \\\ = \sum_{h \in H} P[|E_{in}(h)-E_{out}(h)|>\epsilon]\;P[g=h] \\\ \leq \sum_{h \in H} (2e^{-2\epsilon^{2}N})\;P[g=h] \\\ = 2e^{-2\epsilon^{2}N}$
Inequality $P[|E_{in}(g)-E_{out}(g)|>\epsilon] \leq 2e^{-2\epsilon^{2}N}$ means that for the probability $1-\delta$, and learned hypothesis $g$ using $D_{in}$, inequality $$E_{out}(g) \leq E_{in}(g) + \sqrt{{1 \over 2N}ln{2\over\delta}}$$
holds and thus, the size of hypothesis set $|H|$ doesn't affect generalization bound.
Is this wrong? Where am I doing wrong?
If this inequality is right, why do we use uniform convergence bound like: $$ \forall g \in H, \;P[|E_{in}(g)-E_{out}(g)|>\epsilon] \leq 2|H|e^{-2\epsilon^{2}N} $$?