I have labelled data for a set of experiments, where 100 experiments were conducted and each experiment is associated with 1000 distinct data points that I have labelled. I am using classification methods but am a little uncertain on how to best partition my entire dataset. For example, I've observed that a Random Forest classifier's accuracy will be different if it's trained only on experiments 1-80 and tested on 81-100 versus mixing all the data from all shots and randomly choosing 80% for my training set and the remaining 20% for my test set.
In the above scenario, is there a general consensus on which approach to take? My worry is that choosing the latter option will result in the classifier overfitting the data as there is a lot of similarity between data points in a single shot and typically larger disparity for data points in separate shots. Thus it may perform poorly on future shots. Of course this is where a validation set becomes essential, which I am doing, but I am simply wondering if there is any general guidelines on optimally structuring the training/test set.