0

for time series using neural networks with 500000 samples I am looking for tips for recommended percentage of spliting samples into train/validation/test and intervals of splitting. I absolutely am not looking for cross-validation answers because my model takes so much time to train.

percentage

for dataset of lengths of 10,000, 100,000 and 500,000(my case) what are recommended percentages for splitting train and test specifically for time series(30/70 or 98,2 etc)?

interval splitting

assuming we have chosen 70/20/10 percentage. so is it better to split samples to train=1-70 val=71-90 test=91-100 or its better to choose intervals i.e. 20 then split (for 100 sample example would be like) train=1-14,21-34,...,81-94 val=15-18,35-38,...,95-98, test=19-20,39-40,...,99-100. is there any recommendation for this sort of intervals for splitting?

  • this question is closed for wring reasons, the splitting data to train/test is not cross validation. the answers of these post also imply that, in cross validation we use multiple models but i want to use one model, some answers are there but they re not complete. because they still talking about cross-validation. – Farhang Amaji Sep 08 '21 at 08:36
  • at least if you think answer already exists refer me to better ones – Farhang Amaji Sep 09 '21 at 07:44
  • 2
    To get your question reopened, please edit it in a way that distinguishes it from the apparent duplicates. – whuber Sep 09 '21 at 16:22
  • @whuber I just re-edited it. – Farhang Amaji Sep 09 '21 at 16:39
  • 2
    There are many similar posts, for instance https://stats.stackexchange.com/questions/453386/working-with-time-series-data-splitting-the-dataset-and-putting-the-model-into and [this stored google search](site:https://stats.stackexchange.com train-test splitting for time series data?) – kjetil b halvorsen Sep 09 '21 at 16:47
  • @kjetilbhalvorsen Hi again I have same problem. and the reason I didn't ask to reopen it before is that I hate and irritating arguments here. so please reconsider, reopening my question. all suggested duplicates are either about the cross-validation which I had clearly stated that Im not looking for that or either about the model predictions which are not neural nets, mostly with low sample data. but now again I have to deal with this problem so I decided not to ask for reopening. – Farhang Amaji Dec 06 '21 at 20:23
  • so I read [https://stats.stackexchange.com/search?page=2&tab=Relevance&q=split%20train%20validation%20time%20series%20-cross] but except 4 in which my question was not answered, all others werent relevant. – Farhang Amaji Dec 06 '21 at 20:24
  • I mean I read all of `split train validation time series -cross` searches and now I just ask to reopen it. just to be clear. – Farhang Amaji Dec 06 '21 at 22:52
  • 1
    The edited question is dissimilar from the initial set of duplicates, but is duplicated by other questions. This Meta thread gives some tips for how to search CV. https://stats.meta.stackexchange.com/questions/5549/faq-best-practices-for-searching-cv – Sycorax Dec 07 '21 at 15:45

0 Answers0