I'm having issues with the residuals of my ARIMA models in R for two time series. When I run the Ljung-Box test on the residuals, I get that I should reject the null (i.e. my residuals still have some correlation). I don't know what I should do next. My end goal is to show that the steel time series can be used to predict car production.
The steel and cars time series data was extracted from these sources: steel and cars.
The following is my code:
steel <- read.csv("~/stat248/monthly-production-of-raw-steel-.csv")
cars <- read.csv("~/stat248/australia-monthly-production-of-.csv")
colnames(cars)[2]='cars'
colnames(steel)[2]='steel'
cars=ts(cars$cars,start=c(1956,1),end=c(1993,11),frequency = 12)
steel=ts(steel$steel,start=c(1956,1),end=c(1993,11),frequency = 12)
plot(cbind(cars,steel),main="Production of Cars and Steel in Australia")
cars = na.interpolation(cars)
logcars = log(cars)
logsteel = log(steel)
logcars_stl = stl(logcars,s.window = "periodic")
logsteel_stl = stl(logsteel,s.window = "periodic")
logsteel_arima = auto.arima(logsteel_stl$time.series[,"remainder"],approximation = FALSE,trace=FALSE)
logcars_arima = auto.arima(logcars_stl$time.series[,"remainder"],approximation = FALSE,trace=FALSE)
> Box.test(logcars_arima$residuals,lag=20,type="Ljung-Box")
Box-Ljung test
data: logcars_arima$residuals
X-squared = 61.454, df = 20, p-value = 4.231e-06
> Box.test(logsteel_arima$residuals,lag=20,type="Ljung-Box")
Box-Ljung test
data: logsteel_arima$residuals
X-squared = 56.109, df = 20, p-value = 2.799e-05
Here I get tiny $p$-values even after using auto.arima
. The standard ARIMA method of comparing AICs didn't fare any better. Any advice?