Time series prediction: Neural Network (nnetar) vs. exponential smoothing (ets)

Question

When I make a forecast for the univariate time series $x_1=1, x_2=2, \dots, x_{14} = 14$, why does the nnetar() function in R (which uses a neural network) not calculate correct results, whereas ets() (which uses exponential smoothing) does?

library(forecast)
df <- (1:14)
fit<-nnetar(df)

fc<-forecast(fit,h=10)

Result: 14.75729410 15.37348413 15.85274344 16.21147877 16.47188350 16.65653700 16.78524210 16.87385438 16.93433975 16.97538111

fit2 <- ets(df)
fc <- forecast(fit2,h=10)

Result: 15 16 17 18 19 20 21 22 23 24

Which is the appropriate neuronal network / function for time series prediction? Please consider that the above example is just a simplified data-example.

Ferdi · Accepted Answer · 2017-11-15T20:47:45.817

14

which is the appropriate neuronal network / function for time series prediction? Please consider, that above example is just a simplified data-example.

Well, this totally depends on your data. In your example data you have

a small univariate time series (only 14 observations)
a linear trend
no white noise
no seasonality
no cycle
non non-linearity

nnetar()

Neural networks are generally very data savvy/ data hungry. That means that you need a lot of data to implement an accurate forecast. 14 observations are definitely not enough you rather need some ten or hundred thousands. In general, I do not recommend using neural networks for forecasting univariate time series. One benefit of neural networks is that they can capture nonlinearities, but your data does not exhibit any nonlinearity. Note that nnetar() uses a feed-forward neural network; in recent time series forecasting many researchers use recurrent neural networks instead of feed-forward neural networks.

You can also read this discussion. As far as I know nnetar() is based on the discussion here

If you print fit you will see the model. It is an average of 20 different neural networks and therefore not deterministic.

Series: df 
Model:  NNAR(1,1) 
Call:   nnetar(y = df)

Average of 20 networks, each of which is
a 1-1-1 network with 4 weights
options were - linear output units 

sigma^2 estimated as 0.003636

ets()

This function uses exponential smoothing. Exponential smoothing models require fewer parameters. Therefore they perform better on your tiny dataset.

It might help to have a closer look at the equations of simple exponential smoothing:

$s_0 = x_0$

$s_t = \alpha x_t + (1- \alpha) s_{t-1}$

In your case $s_0$ and $x_0$ are 0.

If you print fit2 you can see that the information criteria are all equal to minus infinity which states that there is no better model than the one you have chosen.

ETS(A,A,N) 

Call:
 ets(y = df) 

  Smoothing parameters:
    alpha = 0.5445 
    beta  = 0.1009 

  Initial states:
    l = 0 
    b = 1 

  sigma:  0

 AIC AICc  BIC 
-Inf -Inf -Inf

edited Nov 15 '17 at 20:47

answered Nov 15 '17 at 18:02

Ferdi

4,882
7
42
62

what is s? s=y? – flobrr Nov 15 '17 at 18:19
s=y. In other words s is the output of the exponential smoothing algorithm. – Ferdi Nov 15 '17 at 18:25
why do i get "Average of 20 networks, each of which is a 8-4-1 network with 41 weights (Model: NNAR(8,4) )". AND: thx for all, but I know this kindness of saying "Thank you" is not apprciated by the admins. They alway delete it, thats why i dont write in usually.... – flobrr Nov 15 '17 at 18:28
do you get an 8-4-1 network on your real data or on the above shown simplified data? – Ferdi Nov 15 '17 at 18:31
1

sorry, my fault. you are right. its NNAR(1,1) – flobrr Nov 15 '17 at 18:32
what is meant by "...therefore not deterministic" – flobrr Nov 15 '17 at 18:34
There is a bayesian approach of averaging many different networks. I refer to AdamO s great answer on your question. – Ferdi Nov 15 '17 at 18:36
Please note that if your data is different from the simple example data an other model might fit better. – Ferdi Nov 15 '17 at 18:37
1

@flobrr "not deterministic": the model assumptions behind the nnetar do not model the process, so the differences between predictions and observed values are taken (incorrectly) to be noise in the model. – AdamO Nov 15 '17 at 18:43
ok, based on this https://robjhyndman.com/hyndsight/nnetar-prediction-intervals/ i thought nnetar is also appropriat for linear univariate time series data – flobrr Nov 15 '17 at 18:49
In my experience you will be better of with ets, arima or tbats. I am not stating this due to any research, but owing to my experience as practicionner. – Ferdi Nov 15 '17 at 18:53
2

+1, nice answer! When you write that NNs are "data savvy", do you mean "data hungry" or "greedy"? – Stephan Kolassa Nov 15 '17 at 18:56
Yes Stephan. I just edited my answer to make it clearer. – Ferdi Nov 15 '17 at 20:49

score 3 · Answer 2 · answered Nov 15 '17 at 18:09

That's because the data generating process is a deterministic model. This is a special case of an ARIMA(0,1,1) process, also known as exponential smoothing. Therefore the exponential smoothing model generates forecasts that match your expectation. The autoregressive neural network does not model this type of process.

Time series prediction: Neural Network (nnetar) vs. exponential smoothing (ets)

2 Answers2