ARMA(p,q) conditional distribution given first k

Question

When initializing an ARMA simulation, say ARMA(2,2), the first data point, $X_1$ can be created by drawing from an unconditional distribution $f(\mu,\gamma(0))$ where $\mu=\mathbb{E}[X]$ and $\gamma(0) =\mathbb{V}[X]$. However, the subsequent value of $X_2$ depends on this first value $X_1$, so we can't use the same unconditional distribution.

So, I am trying to derive the conditional mean and variance for the first $k<p$ points, but seem to be going round in circles. I begun with the one-step conditional mean, i.e.

$$\mathbb{E}[X_t|X_{t-1}=x_{t-1}] = \phi_1 x_{t-1} + \phi_2 \mathbb{E}[X_{t-2}|X_{t-1}] + \mathbb{E}[\epsilon_t|X_{t-1}] +\theta_1\mathbb{E}[\epsilon_{t-1}|X_{t-1}]$$

I think we can safely say that $\mathbb{E}[\epsilon_t|X_{t-1}] = 0$, i.e. future innovations are independence of the past observations. However, I don't think we can say the same about the second and last terms. So, I tried to get a conditional expectation of the contemporaneous error given the observation by re-arranging the ARMA equation for $X_{t-1}$, i.e.

$$\mathbb{E}[\epsilon_{t-1}|X_{t-1}] = \mathbb{E}[x_{t-1} - \phi_1 X_{t-2} - \phi_2 X_{t-3} - \theta_1 \epsilon_{t-2}|X_{t-1}] = X_t - \phi_1\mathbb{E}[X_{t-2}|X_{t-1}] - \phi_2\mathbb{E}[ X_{t-3}|X_{t-1}] - \theta_1\mathbb{E}[\epsilon_{t-2}|X_{t-1}]$$ but this just seems to introduce more terms going back in time.

I guess one way of starting the simulation off is to generate the first $p$ values from a joint unconditional multivariate distribution, with the same mean $\mu$ for all $p$ and with the covariance matrix $(\mathbf{\Gamma})_{ij} = \gamma(|i-j|)$, but it seems to me that once we have one of the initial values generated from a unconditional uni-variate distribution, the multivariate for the other $p-1$ changes, but I can't figure out how.

Also, when estimating model parameters from data using a full MLE, what distribution is used for the first $p$ observation? Is it the multivariate unconditional? Or a uni-variate unconditional for the first observation and then the "partial-conditionals" for the rest? The latter is/are the one I am looking for, I guess. Using the unconditional multivariate seem to me to loose some information, but maybe this is not so.

Add. 1

Maybe we just use unconditional values for mean and variance for the "unobserved values" prior to $t<1$, e.g.

$$\mathbb{E}[X_0|X_1=x_1] = \mathbb{E}[X_0] = \mu$$

and

$$\mathbb{V}[X_0|X_1=x_1] = \mathbb{V}[X_0]= \gamma(0),$$

and then we for $X_2$

$$ \mathbb{E}[X_2|X_1=x_1] = \phi_1 x_1 + \mu\sum_{i=2}^{p}\phi_i $$

and

$$ \mathbb{V}[X_2|X_1=x_1] = \mathbb{V}[\phi_1 x_1 + \sum_{i=2}^{p} \phi_{i} x_{2-i} + \epsilon_{2} + \sum_{i=1}^{q} \theta_{i} \epsilon_{2-i}] \\ = \sum_{i=2}^{p} {\sum_{j=2}^{p} \phi_{i} \phi_{j} \gamma(|i-j|)} + \sigma^2 + \sigma^2 \sum_{i=1}^{q} \theta_i^2 + \sum_{i=2}^{p}{\sum_{j=2}^{i-1} \phi_{i} \theta_{j} \delta_{j-i}} $$

where $\delta_{j-i} = \mathbb{E}[\epsilon_{t-j} X_{t-i}] = \mathbb{Cov}[\epsilon_{t-j}, X_{t-i}]$ and which is equal to 0 for $j > i$.

This seems to give a correct answer for when all $p$ observations are available, but it still doesn't sit right to me. For example, if we have a process which is very persistent and $x_1$ is far away from the long-term mean $mu$, then the expression for the conditional expectation for $X_2$ above seem to give far too much importance to the long-term mean and for persistent process we would expect the neighboring observations to close to each other (assuming that the innovations variance is comparatively small).

Add 2

If we work with the simplest AR(1) process, then $$ X_{t-1} = \frac{1}{\phi_1}(X_t - \epsilon_t) $$ and, assuming that $\epsilon_t \sim \mathcal{N}(0,\sigma^2)$, $$ X_{t-1} | X_t = x_t \ \sim \ \mathcal{N} \left(\frac{1}{\phi_1} x_t, \frac{\sigma^2}{\phi_1^2} \right) $$ But we also have that for bi-variate normal $$ X_1\mid X_2=a \ \sim \ \mathcal{N}\left(\mu_1+\frac{\sigma_1}{\sigma_2}\rho( a - \mu_2),\, (1-\rho^2)\sigma_1^2\right) $$ which for our AR(1) with $\mu_1 = \mu_2 = 0$ and $\sigma_1^2 = \sigma_2^2 = \sigma^2/(1 - \rho^2)$ and $\rho = \phi_1$ means that $$ X_{t-1} | X_t = x_t \ \sim \ \mathcal{N}\left(\phi_1 x_t, \sigma^2 \right) $$ which doesn't agree with our first results above, but would be a correct distribution for $X_{t} | X_{t-1}$. What is going on here?

The easiest way to do this (used by 'arima.sim') is to use a burn-in of sufficient length. Otherwise, to use your approach, you would need to derive the joint distribution of $X_1,X_2,e_1,e_2$ and simulate from this. Another alternative would be to put the model in state space form. The stationary distribution of the state vector (a vector AR(1) process) can be computed using the method given at https://stats.stackexchange.com/a/251548/77222 — Jarle Tufto, Nov 19 '20 at 12:35
@JarleTufto Thank you for your comment. I would have thought that the joint distribution with partial conditioning (i.e. one where not all of the $p$ previous observations are available) is a know result in the ARMA literature, I was just not able to find one and was hopping that someone might know what it is. — Confounded, Nov 19 '20 at 14:46
@Confounded Did you come to a solution? I have the same problem: https://stats.stackexchange.com/questions/558809/calculate-e-z-t-1-x-t-1-in-an-arma-process — Fam, Dec 31 '21 at 06:01

ARMA(p,q) conditional distribution given first k

0 Answers0