1. It's not that all high-order AR models are well approximated by low-order ARMA models, but in practice it's often the case that the correlation structure is such that to achieve a good approximation by a pure-AR(p) requires a high-order model (p large), but a lower-order ARMA(p',q) may fit at least as well ((p'+q) < p). For example, an ARMA(1,1) might be a reasonable model for a series, but as a pure AR you might need an AR(10) to do about as well.
2. MA models may be written as infinite AR's (and vice-versa) by simple manipulation. For simplicity, I'll assume there's no constant term. Let $B$ be the backshift operator:
$y_t= e_t - \theta_1 e_{t-1} - \theta_2 e_{t-2} - ... - \theta_q e_{t-q}$
$y_t= e_t - \theta_1 B e_{t} - \theta_2 B^2 e_{t} - ... - \theta_q B^q e_{t}$
$y_t= (1 - \theta_1 B - \theta_2 - ... - \theta_q B^q )e_t $
$y_t= (1 - \mathbf{\theta}(B))e_t$ , where $\mathbf{\theta}(B)$ is a polynomial in $B$.
Hence
$y_t (1-\mathbf{\theta}(B))^{-1}= e_t$
where the series expansion for $(1-\mathbf{\theta}(B))^{-1}$ is an infinite series in powers of $B$ - an infinite AR.
For example, consider an MA(1):
$y_t= e_t - \theta e_{t-1} $
$y_t= (1 - \theta B )e_t $
$y_t (1-\theta B)^{-1}= e_t$
$y_t (1+\theta B +\theta^2B^2+\theta^3B^3+...)= e_t$
$y_t +\theta y_{t-1} +\theta^2y_{t-2}+\theta^3y_{t-2}+...)= e_t$
which is an infinite AR with $\phi_1=-\theta$, $\phi_2=-\theta^2$, $\phi_3=-\theta^3$ and so on. Note that if such an inversion is valid, these coefficients will decrease geometrically (for MA(q), eventually the same effect will be seen - a fairly rapid decrease in the magnitude of the high-order AR coefficients).
Such inversions are only possible under certain conditions on the parameters. (e.g. see the bottom of the page here)