I have two questions regarding KNN.
Q1. If training data have 7 classes should I consider higher K (for example I start with k=14?).
Q2. Since half of my training data is not labeled I will be using self training.
Self training Algorithm works as follows:
•Let L be the set of labeled data, U be the set of unlabeled data.
•Repeat
– Train a classifier h with training data L
– Classify data in U with h
– Find a subset U’ of U with the most confident scores.
– L + U’ -> L
– U – U’ -> U
What is considered confident scores here? For each training data, is it the percent of the chosen class's occurrence compared to to other classes in the k neighbor of that training data?