Fitting MA(q) and ARIMA(q) model

Question

I know, that fitting an AR(q) model can easily be done by OLS.

Now I wonder, how to do it for MA(q) and ARIMA(p,q), but cant find enything useful, except, that one uses Maximum Likelihood and needs many iterations. But how exactly is it done?

My guess would be (for MA(q)), to initialize a time series of random white noise (the errors), and than perform a first fit, to obtain a first model, than calculate the errors compared to the actual data, and than fit it again with the newly obtained errors, compare it to the actual data to obtain new errors and so on. Similar with the ARIMA model. Is that correct, or completely wrong? And where does the maximum likelihood come in, since the fit could be also done with OLS?

See https://stats.stackexchange.com/questions/260542/state-space-representation-of-armap-q-from-hamilton and https://stats.stackexchange.com/a/296599/77222 — Jarle Tufto, Nov 19 '17 at 14:23
Possible duplicate of [Auto.arima with daily data: how to capture seasonality/periodicity?](https://stats.stackexchange.com/questions/14742/auto-arima-with-daily-data-how-to-capture-seasonality-periodicity) — Xi'an, Nov 19 '17 at 14:41
@JarleTufto that looks complicated, but the most interesting question to me is for now, if one actually uses errors between a model and the actual data, or if one uses white noise for the fit (since in the model definition it says, that the errors are drawn from a Gausian distribution, but I dont know if that is just an assumption for how the errors should be distributed, or if one litterally draws from a gaussian distribution to fit the model) — Luca Thiede, Nov 19 '17 at 14:48
Is your question simply how ARIMA models are estimated? Are you aware that OLS is a maximum likelihood procedure when residuals are IID normal?> — AdamO, Dec 08 '17 at 20:42

keepAlive · Accepted Answer · 2017-11-19T22:30:26.610

When you write

[...] to initialize a time series of random white noise (the errors), and than perform a first fit, to obtain a first model, than calculate the errors compared to the actual data, and than fit it again with the newly obtained errors, compare it to the actual data to obtain new errors and so on.

you actually outline the need for an estimation method that simultaneously handles the estimation of the vector of residuals as well as that of the parameters.

Say one has,

${y}_t = \boldsymbol{x}_t\boldsymbol{\beta} + {\varepsilon}_t$

Where $\boldsymbol{x}_t$ may contain anything you want, why not ${y}_{t-i}$ for $i=1,...,p$. Putting aside the discussion about the conditions related to $p$.

In the MA(q) case, one assumes that ${\varepsilon}_t = {r}_t + \sum_{i=1}^q \lambda_i {r}_{t-i}$.

Which leads to

${y}_t = \boldsymbol{x}_t\boldsymbol{\beta} + {r}_t + \sum_{i=1}^q \lambda_i {r}_{t-i}$

or reformulated in matricial terms using backshift-operator,

$\boldsymbol{y} = \boldsymbol{X}\boldsymbol{\beta} + \left(\boldsymbol{I} + \sum_{i=1}^q \lambda_i \boldsymbol{B}_i\right)\boldsymbol{r}$

Given that playing with MLE actually means playing with distribution-conditioned errors, you have to rearrange the above last equation as

$\left(\boldsymbol{I} + \sum_{i=1}^q \lambda_i \boldsymbol{B}_i\right)^{-1}\left(\boldsymbol{y} - \boldsymbol{X}\boldsymbol{\beta}\right) = \boldsymbol{r}$

In practice, this means playing with distribution-conditioned residuals

$\left(\boldsymbol{I} + \sum_{i=1}^q \widehat{\lambda}_i \boldsymbol{B}_i\right)^{-1}\left(\boldsymbol{y} - \boldsymbol{X}\widehat{\boldsymbol{\beta}}\right) = \widehat{\boldsymbol{r}}$.

So arranged, one can both maximize the (knowledge-driven) likelihood that is assumed for our residuals in conjunction of the estimation of our parameters: that is simultaneously. Hence the frequent use of MLE.

But iterative approaches like the one you described are also used in practice, in the case of, say, GMM when dealing with endogeneity, stoping when a convergence criterion is met.

That helped! An additional question: Do the MLE and the GMM method converge to the same parameters? — Luca Thiede, Nov 19 '17 at 16:40
@LucaThiede. Actually, GMM explitly deals with exogeneity concerns, which MLE does not. This is likely to lead to different parameter-estimates. But, in theory, I see no reason for those not to converge to the same parameters if regressors really are exogenous. — keepAlive, Nov 19 '17 at 16:49

Fitting MA(q) and ARIMA(q) model

1 Answers1