Does KNN fail if the test data have no epsilon close nearest neighbors to the training data?

Question

If I have binary-classification data and a Euclidean metric, and I know the best number of nearest neighbors, then I draw circles on my training data based on my K-value which tell me which regions are from class A and B via voting.

How does KNN do prediction when I have new data points statistically outliers with respect to the training data and are located far away from the original training data? What if the test data have no nearest training data neighbors, and only has neighbors too far away to inference the membership?

I did a "rollback: of your edit of the title, as the updated title did not match the question you asked and that I answered. From the edit, however, it seems you have another question about what to do when you wind up with test points that are way far away from the training points, and I think that warrants its own post (and likely a +1 from me). — Dave, Oct 20 '21 at 21:29

score 1 · Answer 1 · answered Oct 20 '21 at 21:13

1

When it comes time to predict, you take your new point $x$, find the $K$ training points closest to $x$, and use those $K$ points to make your prediction. There will be $K$ training points within some finite distance of $x$, so your question about if no points are nearest does not concern a situation that is possible.

If the training points are far away from $x$, you do the same process.

answered Oct 20 '21 at 21:13

Dave

28,473
4
52
104

Yes but I think asymptotically distance means bad performance – Germania Oct 20 '21 at 21:14
Extremely far away I think cause issues. How do you address large distance away? – Germania Oct 20 '21 at 21:14
Address in what way? You have the method for making your prediction. What else do you want the model to do? – Dave Oct 20 '21 at 21:17
Make the decision boundary explicit function form from KNN – Germania Oct 20 '21 at 21:17
What do you mean by "decision boundary"? That seems to involve a separate issue of what to do once you have found your neighbors. – Dave Oct 20 '21 at 21:19

score 0 · Accepted Answer · answered Jan 11 '22 at 08:35

At the risk of sounding obvious: you could test whether this is actually a problem in your dataset by plotting the NN-distance against the absolute value of the prediction residuals. If there's an upward trend there, you could annotate prediction confidence to your test point predictions based on the inverse of their distance to training points. I'm not aware of any preexisting packages that do this, I'm about to roll my own solution in R.

Does KNN fail if the test data have no epsilon close nearest neighbors to the training data?

2 Answers2