I have seen many times people using the Delta method in order to find the asymptotic distribution of $r$, the sample correlation coefficient, for bivariate normal data. This distribution is given by
$$\sqrt{n} \left( r-\rho \right) \xrightarrow{D} \mathcal{N} \left(0, \left(1-\rho^2\right)^2 \right)$$
and this is a well known result (I know of the z-transform but it's not necessary in this context). I understand the method but what I have been wondering is, why they do not do something simpler. By the invariance of the mles of the sample means and variances, it is easy to show that the sample correlation coefficient is in fact the mle for $\rho$. Now as this is a mle, under the regularity conditions, it should follow the asymptotic distribution of the mle, namely
$$\sqrt{n} \left(r - \rho \right)\xrightarrow{D} \mathcal{N} \left(0, I^{-1} (\rho) \right)$$
where $I(\rho)$ is the Fisher information for $\rho$. All that remains now is to find $I(\rho)$. Differentiating twice the log of the bivariate normal distribution with respect to $\rho$ and taking the negative expectation, I believe one arrives at
$$I(\rho) = \frac{1+\rho^2}{\left(1-\rho^2\right)^2}$$
which, assuming I have not made a mistake in the lengthy computation, is very different than the above asymptotic variance, at least for not so small $\rho$. I have even run a few simulations that show the delta method to be far superior in most cases. The smaller asymptotic variance is in line with what one would expect from the mle, however it turns out to be a very bad approximation.
It's not impossible that I have made a mistake somewhere, although I have checked again and again. If that is not the case then, is there a conceptual mistake in the above reasoning? I have looked at some famous books on inference and nowhere do they mention the Fisher Information for $\rho$, which I also find quite puzzling.
I would appreciate any insight. Thank you.