Based on this discussion ARIMAX vs. Regression With ARIMA Errors and the blog post link https://robjhyndman.com/hyndsight/arimax/ , I have tried the following:
library(forecast)
set.seed(42)
ap<-AirPassengers
#generate correlated time series with a little noise and use as xreg
apcor<-ap+rnorm(length(ap), mean=0, sd=0.1*sd(ap))
aax<-auto.arima(ap, xreg=ts(apcor, frequency=12))
print(paste("Correlation: ", cor(apcor,ap),", AIC: ", aax$aic, ", RMSE: ", accuracy(aax)
[1,"RMSE"]":"))
print(arimaorder(aax))
"Correlation: 0.993574951541277 , AIC: 1163.36365755527 , RMSE: 13.5532954201522"
p d q
0 0 0
#random noise as xreg
noise<-rnorm(length(ap))
aanoise<-auto.arima(ap, xreg=ts(noise, frequency=12))
print(paste("Correlation: ", cor(noise,ap),", AIC: ", aanoise$aic, ", RMSE: ",
accuracy(aanoise)[1,"RMSE"]))
print(arimaorder(aanoise))
"Correlation: -0.041423778067551 , AIC: 1019.60985643167 , RMSE: 10.8369785754467"
p d q P D Q Frequency
2 1 1 0 1 0 12
#plain univariate model
aa<-auto.arima(ts(ap, frequency=12))
print(paste("Univariates Modell ohne xreg mit RMSE", accuracy(aa)[1,"RMSE"], ":"))
print(arimaorder(aa))
"AIC: 1017.84770512239 , RMSE: 10.8461871176961"
p d q P D Q Frequency
2 1 1 0 1 0 12
So: it appears to me that for the first model with correlated data, the external regressor seems to prevent learning anything for the actual ARIMA model - although this leads to much worse in-sample perfomance than ignoring the external regressor.
If the fitting starts with the external regressor and afterwards fits an ARIMA model only on the error (like Arthur asked), this would somehow explain this.
But Rob says "coefficients are estimated simultaneously" - so what does this mean and why is it obviously hard for auto.arima as a whole to pick the "better" model with less weight on the external regressor automatically?