First, it is important to clarify a few things.
- The KL divergence is a dissimilarity between two distributions, so it cannot maximize the likelihood, which is a function of a single distribution.
- Given a reference distribution $P(\cdot)$, the value of $\theta$ that minimizes $\text{KL}(P(\cdot)||Q(\cdot|\theta))$ is not the one that maximizes the likelihood. Actually, there is no likelihood because there is no observed value.
So, saying that minimizing the KL divergence is equivalent to maximizing the log-likelihood can only mean that choosing $\hat{\theta}$ so as to maximize $Q(x_1, \ldots, x_n|\theta)$, ensures that $ \hat{\theta} \rightarrow \theta^*$, where
$$\theta^* = \text{argmin}_\theta \text{ KL}(P(\cdot)||Q(\cdot|\theta)).$$
This is true under some usual regularity conditions. To see this, assume that we compute $Q(x_1, \ldots, x_n|\theta)$, but the sample $x_1, \ldots, x_n$ is actually drawn from $P(\cdot)$. The expected value of the log-likelihood is then
$$\int P(x_1, \ldots, x_n) \log Q(x_1, \ldots, x_n|\theta) dx_1 \ldots dx_n.$$
Maximizing this value with respect to $\theta$ is he same as minimizing
$$\text{KL}(P(\cdot)||Q(\cdot|\theta)) = \int P(x_1, \ldots, x_n) \log \frac{P(x_1, \ldots, x_n)}{Q(x_1, \ldots, x_n|\theta)}dx_1 \ldots dx_n.$$
This is not an actual proof, but this gives you the main idea. Now, there is no reason why $\theta^*$ should also minimize
$$\text{KL}(Q(\cdot|\theta)||P(\cdot)) = \int Q(x_1, \ldots, x_n|\theta) \log \frac{Q(x_1, \ldots, x_n|\theta)}{P(x_1, \ldots, x_n)}dx_1 \ldots dx_n.$$
Your question actually provides a counter-example of this, so it is clear that the value of $\theta$ that minimizes the reverse KL divergence is in general not the same as the maximum likelihood estimate (and thus the same goes for the Jensen-Shannon divergence).
What those values minimize is not so well defined. From the argument above, you can see that the minimum of the reverse KL divergence corresponds to computing the likelihood as $P(x_1, \ldots, x_n)$ when $x_1, \ldots, x_n$ is actually drawn from $Q(\cdot|\theta)$, while trying to keep the entropy of $Q(\cdot|\theta)$ as high as possible. The interpretation is not straightforward, but we can think of it as trying to find a "simple" distribution $Q(\cdot|\theta)$ that would "explain" the observations $x_1, \ldots, x_n$ coming from a more complex distribution $P(\cdot)$. This is a typical task of variational inference.
The Jensen-Shannon divergence is the average of the two, so one can think of finding a minimum as "a little bit of both", meaning something in between the maximum likelihood estimate and a "simple explanation" for the data.