I am doing the following but I am not sure if this is right or which behavior should I expect:
- A union B union C is the full dataset
- Training set: is A union B datasets
- Testing set: is C
- Validation set: is B (so, it is a subset of the training set)
I am using these datasets on a classifier to test the quality of the training set. The training set data is generated using two different methods, so I want to compare them according to a metric calculated over the results of the classifier.
So, my questions are:
What is the incidence to use B as the validation set? Which should be the result of the metric? Because I think should be near a perfect
classification, I'm right?
Sorry if it is a dummy question, I'm quite lost. Thanks!