I have a small set of labeled training data around 300 examples with 50 features each. Also I have a large dataset of unlabeled data around 30000 examples with 50 features each. What is the best way to find the labels of the second dataset?
The way I currently use is
- Train a linear classifier as much as possible with the labeled data
- Use KNN to the unlabeled data 50 at a time and those that are closest to the training examples get to the labeled set.
- Train the linear classifier again with the new training data
etc...