Dealing with ties, weights and voting in kNN

Question

I am programming a kNN algorithm and would like to know the following:

Tie-breaks:

What happens if there is no clear winner in the majority voting? E.g. all k nearest neighbors are from different classes, or for k=4 there are 2 neighbors from class A and 2 neighbors from class B?
What happens if it is not possible to determine exactly k nearest neighbors because there are more neighbors which have the same distance? E.g. for the list of distances (x1;2), (x2;3.5), (x3;4.8), (x4;4.8), (x5;4.8), (x6;9.2) it would not be possible to determine the k=3 or k=4 nearest neighbors, because the 3rd to 5th neighbors all have same distance.

Weights:

I read it is good to weight the k-nearest neighbors before selecting the winning class. How does that work? I.e. how are the neighbors weighted and how is then the class determined?

Majority vote alternatives:

Are there other rules/strategies to determine the winning class other than majority vote?

score 8 · Answer 1 · edited May 14 '14 at 01:37

When doing kNN you need to keep one thing in mind, namely that it's not a strictly, mathematically derived algorithm, but rather a simple classifier / regressor based on one intuition - the underlying function doesn't change much when the arguments don't change much. Or in other words the underlying function is locally near-constant. With this assumption, you can estimate the value of underlying function in any given point, by a (possibly weighted) mean of the values of nearest k points.

Keeping this in mind, you can realize there is no clear imperative on what to do when there is no clear winner in majority voting. You can either always use an odd k, or use some injective weighting.

In the case of neighbours 3 to 5 being at the same distance from the point of interest, you can either use only two, or use all 5. Again, keep in mind kNN is not some algorithm derived from complex mathematical analysis, but just a simple intuition. It's up to you how you want to deal with those special cases.

When it comes to weighting, you base your algorithm on the intuition that function doesn't change much when arguments don't change much. So you want to give bigger weights to points that are closer to point of interest. A good weighting would be for example $\frac{1}{||x-y||^2}$, or any other that is relatively big when distance is small, and relatively small when distance between points is big (so probably an inverse of some continuous metric function).

There has also been a nice paper by Samory Kpotufe and Abdeslam Boularias this year on NIPS touching on the issue of finding the right weighting. Their general intuition, is that the underlying function varies differently in different directions (i.e., its different partial derivatives are of different magnitude), hence it would be wise to in some sense change the metrics / weighting according to this intuition. They claim this trick generally improves performance of kNN and kernel regression, and I think they even have some theoretical results to back up this claim (although I'm not sure what do those theoretical results actually claim, I didn't have time to go through the whole paper yet). The paper can be downloaded for free from their sites, or after Googling "Gradient Weights help Nonparametric Regressors". Their research is directed towards regression, but I guess it applies to classification to some extent as well.

Now, you will probably want to know how you can find the right k, metric, weighting, action to perform when there are draws and so on. The sad thing is, that it's basically hard to arrive at the right hyperparameters after some deep thinking, you will probably need to test different bunches of hyperparameters and see which ones work well on some validation set. If you have some computational resources, and want to arrive at the right parameters automatically at a good set of hyperparameters, there is a recent idea (that I like very much) to use Gaussian processes for derivative-free optimization in that setting.

Let me elaborate - finding the set of hyperparameters (i.e., that minimize error on validation data), can be viewed as an optimization problem. Unfortunately, in this setting we can't get the gradient of the function we try to optimize (which is what we usually want to do, to perform gradient descent or some more advanced methods). Gaussian processes can be used in this setting, for finding sets of hyperparameters, that have big chances, to perform better than the best ones we have found up to the point. Hence you can iteratively run the algorithm with some set of hyperparameters, then ask the Gaussian process for which ones would be best to try next, try those ones, and so on.

For details, look for paper "Practical Bayesian Optimization of Machine Learning Algorithms" by Jasper Snoek, Hugo Larochelle and Ryan P Adams (also to be found on either their websites or via Google).

Warning: optimising hyperparameters to have best accuracy on validation set is a straight way to overfitted oblivion. You want nested CV. — , Dec 10 '12 at 22:13
One quick note that "an odd k" won't necessarily solve the tie problem...eg k= 3 when classifying three groups. Besides that I agree. Nice explanation. — Pyll, Jun 07 '17 at 22:10

score 8 · Answer 2 · edited May 13 '14 at 23:18

8

The ideal way to break a tie for a k nearest neighbor in my view would be to decrease k by 1 until you have broken the tie. This will always work regardless of the vote weighting scheme, since a tie is impossible when k = 1. If you were to increase k, pending your weighting scheme and number of categories, you would not be able to guarantee a tie break.

edited May 13 '14 at 23:18

Nick Stauner

11,558
5
47
105

answered May 13 '14 at 22:43

Ali

97
1
1

12

why tie is impossible when k=1, what if there are two neighbors belong to different classes with the same distance, how do you determine the nearest neighbor with k=1? – j5shi Aug 06 '14 at 03:33
-1 because this is manifestly incorrect – Jake Westfall Jan 27 '21 at 17:35

score 2 · Answer 3 · answered Dec 10 '12 at 22:22

About this tie part, the best baseline idea for ties is usually random breaking, so selecting random class of all winning the voting and randomly selecting a subset of tied objects large enough to fill k.

Such a solution stresses the fact that those are pathological cases that simply don't provide enough information to make a decision in kNN regime. BTW if they are common to your data, maybe you should try some more differentiating distance?

score 0 · Answer 4 · answered Dec 10 '12 at 12:56

0

One possible way is to have the algorithm automatically increase or decrease k until you get a clear winner.

answered Dec 10 '12 at 12:56

gamerx

538
2
12

Dealing with ties, weights and voting in kNN

4 Answers4

Linked