Actually, without some further assumptions on the form of the transfer function in the Wold representation, I don't think it is actually true that it can always be well approximated by a ratio of finite-order polynomials. There are classes of time-series models for covariance-stationary processes where this approximation is not considered adequate --- e.g., when dealing with some "long memory" processes.
Analysis via the spectral density: To gain some insight into this aspect of time-series analysis, it is useful to look at the spectral density of a covariance-stationary process. This is fairly natural, since it allows us to see the process in frequency-space. Consider a covariance-stationary process $\{ X_t | t \in \mathbb{Z} \}$, meaning that its first two moments have the form:
$$\mathbb{E}(X_t) = \mu
\quad \quad \quad
\mathbb{Cov}(X_{t+r}, X_{t}) = \gamma(r),$$
where $\gamma$ is called the transfer function. If the process has an absolutely continuous spectral density then we can write this as:
$$f(\delta) = \frac{1}{2 \pi} \sum_{r \in \mathbb{Z}} \gamma(r) e^{2 \pi i r \delta}.$$
This function is periodic, and we can examine it over the Nyquist range $\tfrac{1}{2} \leqslant \delta \leqslant \tfrac{1}{2}$, which gives a full period. Now, under the standard ARMA representation $\phi(B) X_t = \theta(B) \varepsilon_t$ with $\sigma^2 = \mathbb{V}(\varepsilon_t)$ (which leads to a ratio of two finite polynomials for the $\text{MA}(\infty)$ representation) we get the spectral density:
$$f(\delta) = \frac{\sigma^2}{2 \pi} \bigg| \frac{\theta(e^{2 \pi i r \delta})}{\phi(e^{2 \pi i r \delta})} \bigg|^2.$$
In particular, at the zero frequency we get:
$$f(0) = \frac{\sigma^2}{2 \pi} \bigg| \frac{\theta(1)}{\phi(1)} \bigg|^2.$$
For many covariance-stationary processes, this form approximates the true spectral density fairly well. However, certain kinds of covariance-stationary time series are not well approximated by this form. Particular things that affect this are the rate of decay of the transfer function in the tails (e.g., exponential decay, power-law decay, etc.) and whether the time-series process is "short memory" or "long memory".
One specific case where the ARMA representation is not a particuarly good approximation is when the time-series process has "long memory". This phenomenon is defined by the spectral property that $f(0)=\infty$, which means that the transfer function has the divergent sum $\sum_{r \in \mathbb{Z}} \gamma(r) = \infty$. This property cannot be achieved within the standard ARMA form, since $|\theta(1)| \leqslant \sum_i |\theta_i| < \infty$ and $|\phi(1)|>0$.
Why is it the case that the infinite Wold representation can always be approximated by the ratio of two finite order polynomials?
Unless you impose some accuracy requirements or convergence conditions on the approximation, anything can be approximated by anything. So the question really becomes, under what conditions can we approximate the Wold representation with an ARMA model and still get good approximation properties (e.g., convergence, arbitrary accuracy with a finite order model, etc.)? I will address this in your subsequent questions.
What guarantees the existence of such an approximation?
Certain general forms for the transfer function in the Wold representation can be represented as power series that can be approximated up to arbitrary accuracy by a finite rational function (i.e., a ratio of two finite polynomials). This is a broad topic in real/complex analysis, and I recommend you go back to basics and have a look at the general topic of Taylor series representations of functions, and the classes of holomorphic/analytic functions. You will see that there are certain nasty classes of functions (e.g., periodic functions) that are not well-approximated by a polynomial, and other functions that are not well-approximated by a ratio of finite polynomials.
As previously noted, without some further assumptions on the form of the transfer function, I don't think it is actually true that it can always be well-approximated by a ratio of finite-order polynomials. There are some kinds of covariance-stationary time-series processes where the transfer function is "nasty" and is not well-approximated by the ARMA form. A specific case of this is "long memory" processes.
The other answer here notes that any meromorphic function can be approximated well by a finite rational function (i.e., a ratio of two finite polynomials). This is true, but it just pushes the question back one step: under what conditions will the transfer function in the Wold representation give rise to a meromorphic power series?
How good is this approximation? Is the approximation better in some cases than in other?
Approximation by an ARMA model is certainly better in some cases than others. ARMA models can approximate most "short memory" processes quite well, but they are not great at approximating "long memory" processes. The more general question, how good is the approximation, is large enough to fill entire books --- the answer will depend on the nature of the transfer function in the Wold representation, and how you measure "goodness" of an approximation.