One important and useful result is the Wold representation theorem (sometimes called the Wold decomposition), which says that every covariance-stationary time series $Y_{t}$ can be written as the sum of two time series, one deterministic and one stochastic.
$Y_t=\mu_t+\sum_{j=0}^\infty b_j \varepsilon_{t-j}\,$, where $\mu_t$ is deterministic.
The second term is an infinite MA.
(It's also the case that an invertible MA can be written as an infinite AR process.)
This suggests that if the series is covariance-stationary, and if we assume you can identify the deterministic part, then you can always write the stochastic part as an MA process. Similarly if the MA satisfies the invertibility condition you can always write it as an AR process.
If you have the process written in one form you can often convert it to the other form.
So in one sense at least, for covariance stationary series, often either AR or MA will be appropriate.
Of course, in practice we would rather not have very large models. If you have a finite AR or MA, both the ACF and PACF eventually decay away geometrically (there's a geometric function that the absolute value of either function will sit below), which will tend to mean that a good approximation of either an AR or an MA in the other form may often be reasonably short.
So under the covariance stationary condition and assuming we can identify the deterministic and stochastic components, often both AR and MA may be appropriate.
Box and Jenkins methodology looks for a parsimonious model -- an AR, MA or ARMA model with few parameters. Typically the ACF and PACF are used to try to identify a model, by transforming to stationarity (perhaps by differencing), identifying a model from the appearance of the ACF and PACF (sometimes people use other tools), fitting the model and then examining the structure of the residuals (typically via the ACF and PACF on the residuals) until the residual series appears reasonably consistent with white noise. Often there will be multiple models that can provide a reasonable approximation to a series. (In practice other criteria are often considered.)
There are some grounds for criticism of this approach. For one example, the p-values that result from such an iterative process don't generally take account of the way the model was arrived at (by looking at the data); this issue might be at least partly avoided by sample splitting, for example. A second example criticism is the difficulty of actually obtaining a stationary series - while one may in many cases transform to obtain a series that seems reasonably consistent with stationarity, it's not usually going to be the case that it really is (similar issues are a common problem with statistical models, though perhaps it may sometimes be more of an issue here).
[The relationship between an AR and the corresponding infinite MA is discussed in Hyndman and Athanasopoulos' Forecasting: principles and practice,
here]