2

I'm working on a balanced, binary classification problem in a time-series (financial) dataset. I am using K-fold cross validation that is adapted for time-series (so that I'm never using future data to predict past data).

I have tried many algorithms, such as SVM, RandomForest and K-Nearest Neighbors. While all of them can achieve good results in cross validation, NONE of them have generalized well to the test set.

I use the cross validation to run grid-search feature selection and hyperparameter tuning simultaneously to find the best combination, but again - I have not achieved any generalization.

Do you have any ideas as to why this might be? Any general advice for dealing with this kind of scenario?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • Financial data tend to change its behaviour from time to time even in a single dataset. Are you sure, that your test data behave similar to train data? Moreover, financial data seem to be random in some sense (i.e. previous values contain not enough information about current one) – Georgy Firsov Jan 29 '21 at 22:11
  • @GeorgyFirsov While you might be correct that there is dataset shift, I'm doubtful that this is the issue. I have tried 3 different timeframes and have encountered the problem every time (year of daily data, month of hourly data, a different month of hourly data). All three times, I achieved reasonable (if not strong) results on cross validation, but poor performance on the test set. I think it's unlikely that dataset shift occurred all three times, so I think something else must be the problem. What do you think? – Vladimir Belik Jan 30 '21 at 20:28

0 Answers0