Actually this is pretty simple: Bayes classifier chooses the class that has greatest a posteriori probability of occurrence (so called maximum a posteriori estimation). The 0-1 loss function penalizes misclassification, i.e. it assigns the smallest loss to the solution that has greatest number of correct classifications. So in both cases we are talking about estimating mode. Recall that mode is the most common value in the dataset, or the most probable value, so both maximizing the posterior probability and minimizing the 0-1 loss leads to estimating the mode.
If you need a formal proof, the one is given in the Introduction to Bayesian Decision Theory paper by Angela J. Yu:
The 0-1 binary loss function has the following form:
$$ l_\boldsymbol{x}(\hat s, s^*) = 1 - \delta_{\hat ss^*} =
\begin{cases} 1 & \text{if} \quad \hat s \ne s^* \\ 0 &
\text{otherwise} \end{cases} $$
where $\delta$ is the Kronecker Delta function. (...) the expected loss
is:
$$ \begin{align} \mathcal{L}_\boldsymbol{x}(\hat s) &= \sum_{s^*}
l_\boldsymbol{x}(\hat s, s^*) \; P(s = s^* \mid \boldsymbol{x}) \\ &=
\sum_{s^*} (1 - \delta_{\hat ss^*}) \; P(s = s^* \mid \boldsymbol{x})
\\ &= \sum_{s^*} P(s = s^* \mid \boldsymbol{x}) ds^* - \sum_{s^*} \delta_{\hat ss^*} P(s = s^* \mid \boldsymbol{x}) \\ &= 1 - P(s = s^*
\mid \boldsymbol{x}) \end{align} $$
This is true for maximum a posteriori estimation in general. So if you know the posterior distribution, then assuming 0-1 loss, the most optimal classification rule is to take the mode of the posterior distribution, we call this a optimal Bayes classifier. In real-life we usually do not know the posterior distribution, but rather we estimate it. Naive Bayes classifier approximates the optimal classifier by looking at the empirical distribution and by assuming independence of predictors. So naive Bayes classifier is not itself optimal, but it approximates the optimal solution. In your question you seem to confuse those two things.