2

If I have binary-classification data and a Euclidean metric, and I know the best number of nearest neighbors, then I draw circles on my training data based on my K-value which tell me which regions are from class A and B via voting.

How does KNN do prediction when I have new data points statistically outliers with respect to the training data and are located far away from the original training data? What if the test data have no nearest training data neighbors, and only has neighbors too far away to inference the membership?

Dave
  • 28,473
  • 4
  • 52
  • 104
Germania
  • 224
  • 10
  • I did a "rollback: of your edit of the title, as the updated title did not match the question you asked and that I answered. From the edit, however, it seems you have another question about what to do when you wind up with test points that are way far away from the training points, and I think that warrants its own post (and likely a +1 from me). – Dave Oct 20 '21 at 21:29
  • Failing includes not performing test accuracy well – Germania Oct 20 '21 at 21:32

2 Answers2

1

When it comes time to predict, you take your new point $x$, find the $K$ training points closest to $x$, and use those $K$ points to make your prediction. There will be $K$ training points within some finite distance of $x$, so your question about if no points are nearest does not concern a situation that is possible.

If the training points are far away from $x$, you do the same process.

Dave
  • 28,473
  • 4
  • 52
  • 104
0

At the risk of sounding obvious: you could test whether this is actually a problem in your dataset by plotting the NN-distance against the absolute value of the prediction residuals. If there's an upward trend there, you could annotate prediction confidence to your test point predictions based on the inverse of their distance to training points. I'm not aware of any preexisting packages that do this, I'm about to roll my own solution in R.