I am working on a classification problem with a small amount of labelled data (~200 instances) and a larger sample of unlabelled data (~500 instances).
To increase the size of the training data I am intending to use some oversampling technique (e.g. SMOTE). I was wondering if there is some way that I can use the unlabelled data to improve the oversampling. This is also particularly important as I think the unlabelled data is more representative of the underlying population as certain factors have influenced the choice to test, and therefore label, samples.