0

How to statistically compare two time series?

After reading the above responses, I realized that what I should be doing for my data is exactly what https://stats.stackexchange.com/users/3382/irishstat recommended. However, I am having trouble determining the correct parameters for the arima(p,d,q) function in r. Could someone provide a detailed tutorial for how to correctly choose these values? I know that the time series need to be stationary, which is simple enough to test. I also know the acf/pacf graphs are used for this, but it's still a bit foggy and I want my statistics to be sound. I have been unable to find a successful guide- most of the walkthroughs I have seen skim over the part of choosing the p,d,q values.

EDIT: I've added the 2 time series that I have been working on. The data is sampled bi-monthly for 9 months (excepting Dec, Jan, and Feb). alpha, in this case, is the TN data aggregated by month and year, which allowed me to coerce it to a time series. For our purposes, monthly averages are good enough to display what we are trying to look at. Another important note, these 2 time series have the proper time window, etc., but they are merely examples. They are not the final time series that I will use in the analysis. Final note, the NAs in the first time series caused gaps in the plots, but it is not difficult to apply a gap-filling function, which would generate a plot much like the second.

These time series are clearly not models, but simply points stored in the time series format. I believe that using a model is the best way to compare them, but I am not sure. I was also able to decompose the model to remove the seasonal data, and I was curious if it would be better to compare the series with the seasonal component removed.

alpha <- with(alpha, 
 aggregate(TN~Month+Year, FUN = mean, na.rm=T)
 )
> alpha
   Month Year       TN
1      3 2011 2.550675
2      4 2011 2.166793
3      5 2011 1.666279
4      6 2011 1.235067
5      7 2011 1.306130
6      8 2011 1.380530
7      9 2011 1.434623
8     10 2011 1.599755
9     11 2011 2.261617
10     3 2012 2.529887
11     4 2012 1.938779
12     5 2012 1.700785
13     6 2012 1.160013
14     7 2012 1.099877
15     8 2012 1.244322
16     9 2012 1.471384
17    10 2012 1.695036
18    11 2012 1.959646
19     3 2013 2.808547
20     4 2013 2.546227
21     5 2013 2.112756
22     6 2013 1.875753
23     7 2013 1.882885
24     8 2013 2.010292
25     9 2013 1.771419
26    10 2013 1.820127
27    11 2013 2.351775
28     3 2014 3.002976
29     4 2014 2.286398
30     5 2014 2.177926
31     6 2014 1.772718
32     7 2014 1.245376
33     8 2014 1.274671
34     9 2014 1.378657
35    10 2014 1.554602
36    11 2014 1.702360
37     3 2015 2.771875
38     4 2015 2.373219
39     5 2015 2.025162
40     6 2015 1.607793
41     7 2015 1.656044
42     8 2015 1.525059
43     9 2015 1.494547
44    10 2015 1.746673
45    11 2015 1.942351

V1 <- as.numeric(rep(c(12,1,2),4))
V2 <- as.numeric(c(2011, rep(2012,3), rep(2013,3), rep(2014,3),rep(2015,2)))
V3 <- rep(NA, 12)
df1 <- data.frame(V1,V2,V3)
names(df1) <- c('Month', 'Year', 'TN')

TNMonthsFilled <- rbind(alpha, df1) #adds in dates for Dec-Feb as NAs

TNMonthsFilled$Date <- as.Date(with(TNMonthsFilled, paste(Year, as.numeric(Month), as.numeric(15),sep="-")), "%Y-%m-%d") #converts values back to dates


TNOrdered <- TNMonthsFilled[order(TNMonthsFilled$Date),] #orders dates for coercion to ts

TNts <- with(TNOrdered, ts(TN, start = c(2011,3), end = c(2015,11), frequency = 12)) #TS with NAs

enter image description here

plot(TNts)

Time Series for First River

Nan_Up_Ag <- with(Nan_Up, 
     aggregate(TN~Date, FUN = mean, na.rm=T)
     )

> Nan_Up_Ag
         Date        TN
1  2011-04-15 3.3625000
2  2011-05-15 2.8972222
3  2011-06-15 3.6680000
4  2011-07-15 1.8125000
5  2011-08-15 1.5816667
6  2011-09-15 2.5475000
7  2011-10-15 3.4550000
8  2011-11-15 3.9600000
9  2012-04-15 3.5683333
10 2012-05-15 2.6991667
11 2012-06-15 2.0630000
12 2012-07-15 1.6233333
13 2012-08-15 1.4250000
14 2012-09-15 2.3961111
15 2012-10-15 2.9616667
16 2012-11-15 3.8216667
17 2013-03-15 4.6820000
18 2013-04-15 5.6225000
19 2013-05-15 3.5658333
20 2013-06-15 0.4916667
21 2013-07-15 2.8266667
22 2013-08-15 2.9216667
23 2013-09-15 2.7991667
24 2013-10-15 2.5875000
25 2013-11-15 3.4583333
26 2014-03-15 4.6200000
27 2014-04-15 3.9941667
28 2014-05-15 3.4116667
29 2014-06-15 2.8661111
30 2014-07-15 2.2483333
31 2014-08-15 2.0416667
32 2014-09-15 2.1316667
33 2014-10-15 2.6465000
34 2014-11-15 3.2133333
35 2015-03-15 4.1460000
36 2015-04-15 4.1650000
37 2015-05-15 3.6616667
38 2015-06-15 3.0483333
39 2015-07-15 2.8133333
40 2015-08-15 2.4466667
41 2015-09-15 2.1983333
42 2015-10-15 3.0233333
43 2015-11-15 3.3060000

Nan_Up_ts <- with(Nan_Up_Ag, ts(TN, start = c(2011,3), end = c(2015,11), frequency = 12))

enter image description here

plot.ts(Nan_Up_ts)

River 2 Time Series

RiverComp
  • 3
  • 1
RiverComp
  • 9
  • 1
  • why don't you [post the two tome series and I will try and help you (and others ) that have similar problems. – IrishStat Aug 25 '17 at 21:34
  • Choosing p,d, and q is essentially an optimization problem. Have you tried [auto.arima](https://www.rdocumentation.org/packages/forecast/versions/8.1/topics/auto.arima)? – Nate Diamond Aug 26 '17 at 00:34
  • Choosing an ARIMA order has been asked dozens if not hundreds of times here. Your question must be a duplicate, you just have to choose which of the older threads you prefer (because there have been other duplicates before). See e.g. [these questions](https://stats.stackexchange.com/questions/tagged/arima+model-selection). Besides, time series textbooks covering ARIMA usually have a section on order selection. Have you tried those? What did you not understand there? – Richard Hardy Aug 26 '17 at 08:04

0 Answers0