1

I have 44 years of data for 4 variables: Y, X1, X2, X3.

All 4 variables are non-stationary. I plan to run this model: Y= X1 + X2 + X3 + e. I wonder what should I do next, steps by steps, to fit my model?

1.Should I first use log and/or difference on all the 4 variables to make them become stationary, and then run the model?

2.How to decide whether I should put 4 logs and/or difference on all the 4 variables? or on Y, X1, X2, or X3?

3.What time series methods should I use to fit my model? AR, MA, ARMA, ARIMA, ARMAV, Multivariate Time Series?

4.I am using SPSS. Answers about SPSS or not are all appreciated.

Thanks a lot.

caroline
  • 61
  • 7

1 Answers1

1

The first thing to do is to determine what level of differencing is required for all of the series and then convert Y, X1, X2 and X3 to y,x1, x2 and x3. The second step is to determine the appropriate ARMA filter for each of these three series x1,x2 and x3. Develop pre-whitened cross-correlations see here https://onlinecourses.science.psu.edu/stat510/node/75 and then identify the appropriate transfer function between Y and X1,X2 AND X3. Add any necessary ARIMA structure and any needed indicators reflecting Pulses, Level Shifts , Seasonal Pulses and/or Time Trends that may be needed . Estimate the model and then delete non-significant structure. Re-examine residuals to possibly augment the model via diagnostic checking procedures.

SPSS is in my opinion ill-suited for your needs BUT you might call their help desk and ask for advice as more recent versions may have an automatic transfer function identification option available to you. Otherwise you might try googling terms like "automatic time series modelling" or "automatic intervention detection" etc ....just make sure that the suggested solutions are multi-variate and single equation

Logs or any other transforms such as weighted modelling should only be used when it is proven that the error variance is not homogenous across time. See When (and why) should you take the log of a distribution (of numbers)? for a good discussion on this topic. Transforms should never be done willy-nilly i.e. without cause.

IrishStat
  • 27,906
  • 5
  • 29
  • 55
  • I disagree with your statement on SPSS being ill suited. They do have excellent automatic transfer function modeling. – forecaster Aug 30 '17 at 14:59
  • but no automatic diagnostics to add possibly needed intervention variables that may have been overlooked by the user OR any automatic transfer function id or any built-in feature to determine an optimum way to deal with non-constant error variance or time-varying parameters or a few other missing items. If you wish we can do an offline comparison of software that I am familiar with and we can possibly both learn the art of the possible. – IrishStat Aug 30 '17 at 15:36
  • It might not do everything that you are mentioning and not sure if everything is important, i do know spss does automatic outlier detection and automatic transfer function modeling, and it was written by a famous statistician with extensive time series back ground ? Can you make a guess of who wrote their automatic transfer function/outlier detection? – forecaster Aug 30 '17 at 16:39
  • they don't have an intervention detection procedure in their tf modelling because I didn't give them a license to do that . Theirs is a piece meal very imperfect solution written by RT that is not unified. SPSS does not 't have an automatic procedure to pre-filter the series because I rejected their proposal to acquire AUTOBOX . Beauty is in the eye of the beerholder. (beholder !) – IrishStat Aug 30 '17 at 16:46
  • @forecaster I am a little shocked and stunned with your comment "not sure if everthing is important" . any and all quality computational aids are a blessing to help those who don't know what you know. – IrishStat Aug 30 '17 at 16:54
  • @IrishStat Do you mean that I first need to difference all the four variables? I read materials that say usually differencing will not be more than second-order. Thus, should I try first differencing and then second-order differencing to see which one is better to get the model done? (2) If I plan to try log. Should I first test all the four variables respectively to see if each of them has error variance not homogenous across time, then I put log only on the variables whose error variance is not homogenous? (3) Can I do both differencing and log? If yes, which one should go first? – caroline Aug 31 '17 at 08:04
  • 1) The order and degree of differencing that should be applied is based upon the ARIMA model for the individual series – IrishStat Aug 31 '17 at 10:46
  • 1) The order and degree of differencing that should be applied is based upon the ARIMA model for the individual series 2) As said power transformation for the Y series should be based upon the dependence of the transfer fn model errors and the the expected value of Y 3) the power transform OR weighted estimation should be done last after you have dealt with possible outliers ,level shifts ,time trends in the tf residuals. – IrishStat Aug 31 '17 at 10:52
  • @IrishStat (1)Should I first try differencing for each variable (Y, X1,X2,X3) and see if first or second-order difference make each series become stationary? If yes, then I can continue to fit the model? (2) If not, and if all or some variables have non-homogeneous error variance across time, should I put a log on the differenced variables or the original variables? – caroline Sep 01 '17 at 01:15
  • as i said build a univariate arima model for each of the 4 ....you will find the appropriate differencing for each series by examining the arima mdels – IrishStat Sep 01 '17 at 01:20
  • @IrishStat After I find the appropriate differencing, should I run a multivariate ARIMA model on the already differenced 4 variables (Y, X1,X2, X3)? – caroline Sep 01 '17 at 01:34
  • @IrishStat what are the criteria to exam the ARIMA model? how to evaluate whether the model I get is good enough? Any website providing this information you might kindly recommend? Thanks a lot. – caroline Sep 01 '17 at 02:30
  • @IrishStat I read this somewhere "When either simple or seasonal differencing (level stabilizing transformations) is simultaneously in use with either the log or square root transformation (variance stabilizing transformations), the variance stabilizing transformation is always applied first. Should I use log first and then difference on the logged variable? or should I difference first and then log on the differenced variables? – caroline Sep 01 '17 at 04:52
  • "the variance stabilizing transform should never be done first" as the decision to do this is based upon the current models residuals. Identify the form of the model using pre-whitened ccf .. estimate the model .. determine adequacy ... transform ifnecessary – IrishStat Sep 01 '17 at 08:35
  • Software packages like SPSS ( and many otjhers SAS ,EVIEWS,STATA et al ) do not incorporate schemes to actually detect the need for a transform (ranging from lamda = -1 to 1) thus all they can do is to blithely advise you to transform first. In the 1970's this was the suggested approach since software didn't exist to actually identify the appropriate power transform based upon a tentative model's residuals. – IrishStat Sep 01 '17 at 08:57
  • After sober thought I believe your "read" was based upon a remark by Jenkins in a monograph or the user guide for his software. I hate to criticize the dead and the famous but he like all of us was constrained by software or the art of of the then possible. Transformation determination using the original data was an early flaw ( and continues to this day ) rather than transformations of the error process which is where parametric assumptions need to be validated. You will find many citations regarding this issue in this forum. . – IrishStat Sep 01 '17 at 10:15
  • @IrishStat If I test autocorrelation ( through residual analysis) and find Y, X1, X2, X3 all or some of them have non homogeneous variance, can I first log on the variables with autocorrelation? Then, I test the logged and original (unlogged) variables to see if they are stationary? If all or some of the logged and original (unlogged) variables are not stationary, then I use differencing on the non stationary variables. Then, I use the differenced variables to run ARIMA, Is the above process ok? – caroline Sep 01 '17 at 11:38
  • there is no need to worry about non-homogeneous error variance for the series by themselves. The concern would be if the residuals from a reasonably sufficient tf model exhibited changing variance at particular points in time requiring weighted analysis or if the variance of the residuals was linearly related to the expected values of the tf requiring a power transform – IrishStat Sep 01 '17 at 13:16
  • @IrishStat Do you mean that if after residual analysis, I find the variance of the residuals was linearly related to the expected values of the Y, then I can first log on only the variables (the variance of the residuals was linearly related to the expected values of the Y). Next, I test stationary on the logged and unlogged variables, and then difference only on the variables that are not stationary. Then I run ARIMA model? Is the above process ok? – caroline Sep 02 '17 at 01:27
  • after residual analysis has extracted/identified any arima process AND remedied needed outliers (pulses/level shifts/seasonal pulses , time trends) THEN if the variance of the residuals does not change deterministically over time THEN if the residuals are linearly related to the expected values of the Y (via BOX-COX TEST) THEN transform based upon the results of the BOX-COX TEST – IrishStat Sep 02 '17 at 12:42
  • http://www.autobox.com/cms/TFFLOW.png is an overview , Simply add one more iteration at the end to deal with variance heterogeneous issues i.e. deterministic change in error variance at discrete points in time (http://onlinelibrary.wiley.com/doi/10.1002/for.3980070102/full) or error variance dependency on the expected value. The Tsay reference points out the shortcomings in some time series software offerings. – IrishStat Sep 02 '17 at 15:01
  • @IrishStat What are the criteria to exam the ARIMA model? How to evaluate whether the model I get is good enough, e.g., fit index, R square, etc.? Any website providing this information you might kindly recommend? – caroline Sep 03 '17 at 00:01
  • acf of the errors should be (relatively) free of structure.There should be no deterministic pattern in the errors (no pulses,no level shifts, no seasonal pulses, no time trends in the errors. There should be constant error variance over time i.e for different sub-intervals. parameters for the arima model should be constant over different sub-intervals. Essentially the arima model should separate signal & noise.If there is no discernable signal an rsq = 0.0 is good enough. www.autobox.com contains a lot of time series content that you might find educational including afs university material. – IrishStat Sep 03 '17 at 11:22
  • @IrishStat After I finished Box-Cox transformations on each of the four variables (Y, X1, X2, X3), should I first check whether the four Box-Cox transformed variables are stationary? If some of them are non-stationary, then I do differencing on the non-stationary variables. Then I run ARIMA model on the four variables? If this process OK? – caroline Sep 04 '17 at 02:04
  • do not transform before you build the 4 arima models for pre-whitening (i.e. tf model identifiction) purposes. – IrishStat Sep 04 '17 at 11:08
  • @IrishStat It seems that you prefer Box-Cox transformation over log or square root transformation. Am I correct? May I ask why? – caroline Sep 04 '17 at 11:36
  • Box-Cox is a test that enables one to determine the best transformation rather than to assume one . https://stats.stackexchange.com/questions/18844/when-and-why-should-you-take-the-log-of-a-distribution-of-numbers . see my response . The feature of determining which transform is best is one of the important automatic but missing features of SPSS. – IrishStat Sep 04 '17 at 12:14
  • @IrishStat It seems Box-Cox transformation works differently for Y and X. (Log works the same for both Y and X). What are the differences? How to use Box-Cox transformation on Y? How to use Box-Cox transformation on X? – caroline Sep 05 '17 at 03:09
  • Hoe does it work differently ? Please explain with supporting documentation/results – IrishStat Sep 05 '17 at 09:58
  • @IrishStat Should I include constant in my ARIMA model? I found this somewhere "Excluding the constant is recommended when differencing is applied." But I also found this somewhere "Inclusion of a constant is standard unless you are sure that the overall mean series value is 0." Which one is correct? Thanks. – caroline Sep 06 '17 at 00:28
  • Yes include a constant always and if it is ultimately not significant then delete it at the end/ – IrishStat Sep 06 '17 at 01:34
  • @IrishStat After including the constant, I got Estimate= .139, SE= .081 t= 1.726, Sig.= .093, the constant is insignificant, Should I need to report this constant in my paper, if it is insignificant? What do you mean delete it at the end? Do you mean that I should run another ARIMA model with a constant? Or I just don't have to report this insignificant constant in my paper? – caroline Sep 06 '17 at 08:22
  • Rerun your final model without a constant – IrishStat Sep 06 '17 at 09:36
  • after all of this free guidance .. not one word of appreciation ... where is common courtesy – IrishStat Sep 06 '17 at 09:40
  • @IrishStat After I excluded the constant from the ARIMA model, I found the stationary R squared decreased, the significance value of Ljung-Box statistic decreased, and, worse (too bad), the insignificant variable (X2) became significant.(X2 was initially insignificant in the ARIMA model with a constant), What is wrong here? How to fix it? Thank you very much. Thank you very much. Thank you very much. courtesy^2 – caroline Sep 06 '17 at 10:37
  • It is difficult for me to diagnose this without data , ,models etc. . I would reintroduce the constant and argue that the family/collection of coefficients i.e. the composite model is better with the constant rather than without. No big problem here ... it is often important not to over-read significance of coefficients ESPECIALLY a constant. – IrishStat Sep 06 '17 at 14:15
  • @IrishStat Thank you. How to interpret Box Cox transformation? For example, log Y= 5 log X. I interpret "One percent increase in X is associated with 5 percent increase in Y. If Box Cox transformed Y= 5 Box Cox transformed X. Can I interpret" One unit increase in X is associated with 5 unit increase in Y"? Is this correct? Or you have another way to interpret correctly? Thank you very much. – caroline Sep 07 '17 at 00:20
  • a 1% increase in X leads to a 5% increase in Y is correct . – IrishStat Sep 07 '17 at 08:15
  • @IrishStat Thank you. Do you mean the interpretation for Box Cox transformed Y= 5 Box Cox transformed X is exactly the same as the interpretation for log Y= 5 log X? Thank you very much. – caroline Sep 07 '17 at 11:45
  • NO UNLESS Y IS ALSO LOGGED . If the model is y =5*logx then that means for a 1% change in X , Y will increase by 5 units – IrishStat Sep 07 '17 at 15:31
  • @IrishStat Thank you. How to interpret this: Box Cox transformed Y= 5 Box Cox transformed X? Do you mean the interpretation is" a 1% increase in X leads to a 5% increase in Y"? Thank you very much. – caroline Sep 08 '17 at 00:14
  • No a 1% increase in X leads to an increase of Y by 5 (NOT 5%) . If Y is measured in dollars then this is an increase of 5 dollars – IrishStat Sep 08 '17 at 00:56
  • why don't you upvote my answer and then accept it as your answer – IrishStat Sep 08 '17 at 23:36
  • were you satsified with my answer ?. If so .... – IrishStat Sep 11 '17 at 14:52
  • If you have no more questions accept an answer to close the question. – IrishStat Sep 15 '17 at 10:59
  • @IrishStat Thank you very much. I am trying to figure out the ARIMA model. Thank you very much. If I have more questions, I will ask later. Thank you very much for your help. – caroline Sep 16 '17 at 03:29