I split the full data into training and test set in 80:20 ratio. Then within the training set I randomly carved out 10% and called it the dev (development) set. In the dev set, I select features and run 5-fold cross validation to find the optimal hyperparameters for each ML algorithm. After all these steps are done, I applied the selected features along with optimized ML algorithms to train and build the models using the full training data, then the trained models will be used to predict and be evaluated in the test set.
Is is appropriate to derive the dev set from the training set? Or do the dev, training and test sets have to be mutually exclusive?
Update: Suggested link Should final (production ready) model be trained on complete data or just on training set? discussed a completely different matter, thus my original question is not a duplicate.