ADDENDUM 30-3-2017
An important clarification: None of the derivations below guarantees that "$\mu$" is the "true value" we are attempting to estimate. What we only show is that if $\theta_n$ converges in $L^2$ to some constant, then this constant is also its probability limit.
Whether the probability limit of the estimator is the true value, and so whether the estimator is consistent, is not proven here. So the whole derivation below presupposes that $\mu$ is, after all, the "true value".
$\newcommand{\E}{\mathbb{E}}$
Assume that we don't know whether $\mu$ is the mean, or the probability limit etc, of our estimator $\theta_n$.
We can write
$$\E[(\theta_n-\mu)^2] = \mu^2 - 2\E(\theta_n)\mu + \E(\theta_n^2)$$
which we can view as a quadratic polynomial in $\mu$. To obtain convergence in $L^2$ we need this quadratic to not be bounded away from zero, as a necessary condition. Being a quadratic, we can easily examine its roots.
Its discriminant is $$\Delta_{\mu} = 4[\E(\theta_n)]^2 - 4\E(\theta_n^2) = -4\text{Var}(\theta_n)$$
We want the discriminant to be greater or equal than zero, otherwise the polynomial won't have real roots. Since the variance is non-negative, we need at least asymptotically for $\text{Var}(\theta_n) \to 0$. Given this we then have asymptotically the double root
$$\mu = \lim \E(\theta_n)$$
So if $\E[(\theta_n-\mu)^2] \to 0$, it means that $\text{Var}(\theta_n) \to 0$ and $\lim \E(\theta_n) = \mu$.
These are sufficient conditions for consistency (sufficient but not necessary though, either because the variance may not even exist, or because of situations like this one). [And again, they are sufficient for consistency if we assume from the start that $\mu$ is the true value we are trying to estimate.]
And why these conditions should be sufficient for consistency? What do they have to do with the probability statement
$$\Pr(|\theta_n -\mu| > \varepsilon) \to 0$$
Well, as another answer mentioned, this probability is tied to the variance of the distribution by Chebyshev's Inequality, so if $\mu$ is the asymptotic expected value of $\theta_n$ then
$$\Pr(|\theta_n -\mu| > \varepsilon) \leq \frac{\text{Var}(\theta_n)}{\varepsilon^2} $$
So if $\lim \E(\theta_n) = \mu$ Chebyshev's Inequality is applicable, and then if $\text{Var}(\theta_n) \to 0$, the probability goes to zero.
And so intuition for $L^2$-convergence being sufficient for consistency, is in my view shifted to whether we understand intuitively Chebyshev's Inequality...
...because here too, the intellectual objection of the OP appears too: a not-squared difference appears bounded by a squared difference, but which "for small deviations" (smaller than unity) "is smaller". Well, the "intervening" operators (Probability, Expected Value) have a lot to do with it, since ($I\{\}$ being the indicator function),
$$\Pr(|\theta_n -\mu| > \varepsilon) =\E\left(I\{|\theta_n -\mu| > \varepsilon\}\right) $$
$$= \E\left(I\left \{\frac{(\theta_n -\mu)^2}{\varepsilon^2} >1 \right\}\right) \leq \E\left(\frac{(\theta_n -\mu)^2}{\varepsilon^2} \right)$$
...and this last inequality holds because
$$I\left \{\frac{(\theta_n -\mu)^2}{\varepsilon^2} >1 \right\} \leq \frac{(\theta_n -\mu)^2}{\varepsilon^2} $$
and it is when I saw the above and realized why this last inequality holds, that I gain some intuition on Chebyshev's Inequality.