When training a model it is more and more common to augment data. posts indicate that only the training set shall be augmented. On the other hand it is common to split dataset in a fashion following ratios like 70% (train), 15%(validation), 15% (test)
My question is:
- When using augmentation techniques, shall this ratio still be respected after augmentation (meaning that the number of items included in validation and test deternimnes the augmentation ratio)
- or does the dataset shall be split before augmentation process (meaning that dataset ratios are unbalanced) ?
Any publications regarding this topic?