How to identify the order q of the moving-average part of a SARIMA model?

Question

I'm analyzing this time series

y = [18 22 26 16 19 21 18 22 25 31 30 34 31 25 21 24 21 28 24 26 32 33 36 39 32 33 42 44 43 48 50 56 57 59 61 49 49 57 69 72 75 76 78 73 73 75 86 97 110 134 138 146 146 132 136 143 133 158 160 182 183 188 180 176 185 194 217 220 237 266 270 264 283]

and I'm trying to understand why the researchers chose the model $\text{ARIMA}(0,1,1)(0,1,1)_{10}$. These are my reasoning based on the below figures:

$p=P=0$ since there aren’t significant autocorrelations (ACF fig.1)
$d=D=1$ since differencing was applied once (fig.1)
$s=10$ since there are 10 periods in each season (ACF fig.1)

But what about $q$? From theory, $q$ represents the number of the random process' previous points that are taken into account, for example

MA(1) : $x_{i} = \varepsilon_{i} + a_1 \cdot \varepsilon_{i-1}$, where $\varepsilon$ is the driving noise
MA(2) : $x_{i} = \varepsilon_{i} + a_1 \cdot \varepsilon_{i-1} + a_2 \cdot \varepsilon_{i-2}$
and so on

But how to understand how many of the previous points of the random process have to be considered?

Moreover, I also read that a property of MA(q) models is that there are nonzero autocorrelations for the first q lags and autocorrelations = 0 for all lags > q.

But from ACF plots below, we see that there aren’t significant autocorrelations, i.e. all autocorrelations are (statistically) 0.

So I'm a bit confused.

IrishStat · Accepted Answer · 2020-03-01T21:42:16.183

if your chosen model was based upon the presumption that there no pulses , no step/level shifts and no local time trends that could be a big problem . The acf/pacf might (usually ! ) needs to be conditioned on latent deterministic factors otherwise model identification is incorrectly trying to "fit/explain" data points that should be excluded or conditioned for as NOT being part of the memory process. See @Adamo's cautionary reflections here Interrupted Time Series Analysis - ARIMAX for High Frequency Biological Data?

If you wish to post your data I will provide more detailed analysis.

EDITED AFTER RECEIPT OF DATA :

This analysis suggests that one-stop AIC based fitting is ill-equipped to deal with data that has complex structure. Pulses and time trends are to be detected along with non-constant error variance .

Model building as some have defined is like peeling an onion requiring the testing of assumptions and suggesting appropriate remedies. What follows, in my opinion, is a master-class in univariate time series modelling highlighting a suggested iterative approach per https://autobox.com/pdfs/ARIMA%20FLOW%20CHART.pdf .

Your series has 73 annual values. It has three distinct break points in trend and three pulses thus the acf and the pacf of the original series are of little use in identifying the appropriate memory model as they are fundamentally "damaged" by the latent deterministic structure. The software/approach you are using will work fine when the data is free of these kinds of effects and a number of other effects like changing parameters or changing error variance over time.

Unfortunately ( or fortunately for your edification !) you have chosen a complex series requiring a complex solution.

Here is your data with acf/[acf here

The acf\pacf suggest non-stationarity but there are three distinctly different alternatives to rendering the series stationary in the mean viz. 1) differencing ; 2 ) de-meaning i.e. adjusting for a shift in the mean ) and 3) de-trending using time trends (deterministic structure) .

The software/approach you are taking leans on/ assumes differencing which is not appropriate for data like this.

Here are the detected intervention types and period of introduction ( 3 trends and 3 pulse )

Operationally this is equivalent to introducing 6 dummy indicators as regression input series. This is what the augmented data looks like after reducing the # of pulses to 1 ( at period 73).

and

After adjusting for these 6 deterministic series , this is what the acf/pacf looks like suggesting an ar(1) model (1,0,0) . The final equation is here and here with the acf of the residuals here

The Actual/Fit and Forecast graph is here

The Actuals & Cleansed graph highlights the trend changes and the anomalies

In specific there is no ma structure required for your data . If the pac had more significant correlations than the acf then the number of required/suggested ma coefficients would be the number of significant acf's.

The authors of your referenced paper ( and their reviewers ! ) were not nuanced enough to know that there often are more viable alternatives to differencing to render a series stationary and were completely unaware of the impact of error variance dynamics and consequences.

I used AUTOBOX for this analysis as I had helped to develop it. The primary source for Intervention Detection is http://docplayer.net/12080848-Outliers-level-shifts-and-variance-changes-in-time-series.html

ADDENDUM:

I looked closely at the errors from the above model and found that there was a significant increase in error variance (now visually obvious) which yielded this test result.

The model is now simpler with only 1 pulse and the Actual/Fit and Forecast here showing 95% prediction limits using Monte-Carlo simulation procedures.

with a "much leaner" acf of the model residuals here

Thank you for answering. I did not understand much since I'm new to this subject. I added data and reference to the question. — sound wave, Feb 29 '20 at 14:29
Thank you so much for the time you took to analyze in detail. Strange thing is that your model is basically the opposite with respect to the authors' model: your conclusion is an AR model with no seasons and no MA, authors say MA model with seasons and no AR. — sound wave, Mar 02 '20 at 08:40
read @Ben comment about high-order D https://stats.stackexchange.com/questions/448002/is-having-a-high-p-and-q-evidence-that-arima-would-be-a-better-model-than-mo/448095#448095 .unnecessary differencing needs unnecessary ma structure to counter remedy the unwanted effect — IrishStat, Mar 02 '20 at 11:00
In conclusion the regular and seasonal and the ma and seasonal ma coefficients are totally unnecessary and redundant and self-cancelling to each other possibly (probably !) .. One of the problems with estimating redundant structure models is that there are an infinite # of solutions which may apply . Think of dividing by 100 and multiplying by 100 OR dividing by 999 and multiplying by 999 ... how to choose which one is correct when neither one is needed. Differencing this series is where the problem arises as their are segments of different trendlines within the series. — IrishStat, Mar 02 '20 at 11:10
Additionally thy did not treat the obvious fact that there was a variability change thus data has to be weighted to make it conformable. — IrishStat, Mar 02 '20 at 11:11
If you wish you can contact me offline as this post is long enough already. — IrishStat, Mar 02 '20 at 11:42
Even if their analysis is not correct, I'm curious about how they chose p, q, P and Q. Did you understand that from the paper? — sound wave, Mar 02 '20 at 15:41
They set a maximum for each of the 4 characteristics and then assume that there are no latent deterministic effects such as pulses , level/step shifts , seasonal pulses and time trends and that the error process for any combination is constant through time and that model parameters are also invariant over time . They then try ALL possible combinations of these 4 factors and compute the AIC statistic for each combination that is tried and then select the one that is the smallest AIC to coronate the winner. It is a list-based approach. — IrishStat, Mar 02 '20 at 16:16
I should also have said that their tournament also includes d and sd which in your case was probably set to 2 and 10 . Thus there are 6 characteristics to be studied .. The total # of trials would then be 3 x 10 x 6 x 6 x 2 x 2 for d , sd , p ,q, P , Q = trials PLUS another set to consider the possible inclusion of a constant — IrishStat, Mar 02 '20 at 17:40

How to identify the order q of the moving-average part of a SARIMA model?

1 Answers1

Linked