2

I have a time-series data that I want to model using machine learning models like Lasso Regression, Ridge, elastic net, etc.

However, in order to make it stationary, I difference the output variable, which is resulting in negative values being present now in the differenced data.

However, I think that differencing is having a negative effect on the preformance of my models:

With Differencing: (Yes I do realize the above-100 MAPE, however I think this is also due to the effect of having negative values in the differenced data. I know MAPE has many pitfalls, I also wish if someone could help me solve this issue)

Best Validation Scores:
R^2: 0.062
Adj R^2: -1.904
RMSE: 34.442
MSE: 1215.642
MAE: 26.633
MAPE: 231.714

Testing Scores:
R^2: 0.313
Adj R^2: -0.473
RMSE: 33.659
MSE: 1132.937
MAE: 23.688
MAPE: 215.160

Without Differencing:

Best Validation Scores:
R^2: 1.000
Adj R^2: 1.000
RMSE: 0.004
MSE: 0.000
MAE: 0.003
MAPE: 0.002

Testing Scores:
R^2: 1.000
Adj R^2: 1.000
RMSE: 0.004
MSE: 0.000
MAE: 0.003
MAPE: 0.002

I just want to know if

  • this is normal or not ?
  • and how to deal with differenced data ?
  • is avery low RMSE (around 0) and a very high R^2 (around 1 and some cases 1) a sign of overfitting ?

Note: My dataset is small in size (154 samples). However I tried to gather more data to make it bigger, but the same thing is happening.

Perl
  • 501
  • 3
  • 13

1 Answers1

1

differencing is a form of filtering . Unwarranted filtering like differencing or uneeded power transformations can have a deleterious effect.

the original series does not have to be stationary. the residuals from a useful model must be stationary.

IrishStat
  • 27,906
  • 5
  • 29
  • 55
  • thank you @IrishStat, however, I am only afraid of the huge decrease of the error metrics when I don't difference (I mean, R^2 of 1 and RMSE of nearly zero!!) – Perl Sep 21 '19 at 17:44
  • To be of further help .. I would need to look at the data . Please post it a csv format and I will try to look at it . If for some reason the data is proprietary simply code it by adding constant1 and dividing by constant2 ..this preserves the underlying model .. – IrishStat Sep 21 '19 at 17:50
  • I have no knowledge whatsoever about the assumptions using machine learning models like Lasso Regression, Ridge, elastic net, etc . If you review my most recent response comparing some assumed models https://stats.stackexchange.com/questions/427996/forecasting-linear-vs-exponential-vs-arima/428066#428066 with EDA you will better understand what I can do to help. – IrishStat Sep 21 '19 at 17:53