Why k-nn classifier gives exactly the same accuracy with both Euclidean distance and Manhattan distance algorithm with my data?

Question

I am working on the breast cancer dataset in Weka where the class variable is the target of classification. I ran the nearest neighbour classifier (lazy/IBk) with Euclidean distance with different values of k-nn for the k nearest neighbours, from 1 to 10 (with all the other features selected) and I got the following results:

Then I ran the same classifier with the same selected features but this time with Manhattan distance and with k from 1 to 10 and I got exactly the same result as before:

I wonder why the two result tables are literally the same?

Sycorax · Accepted Answer · 2020-12-07T19:32:32.000

3

Either or both of these conditions could exist:

They're the same because nearest neighbors ultimately only depends on what observations are nearest. In other words, each observation has the same ranked distances from the other observations, for these choices of $k$. (The ranks of the $k+1$ nearest observation, and all more distant observations, has not effect on the $k$-NN classification.)
The ranks could be different, but the distance to the nearest observation of like class (positive/negative) is not changed. So if changing the metric changed the ranked distances, this effect will be suppressed if the nearest neighbor in the Manhattan case is the same class as the nearest neighbor in the Euclidean case, for all cases, for all $k \in \{1,2,3,\dots,10\}$.

(Or there's a coding mistake somewhere. Coding mistakes sometimes happen.)

edited Dec 07 '20 at 19:32

answered Dec 06 '20 at 18:54

Sycorax

76,417
20
189
313

Thanks for your answer. Besides what you have explained, can the dimensionality of data be one reason as well? The dataset on which I am working is not a big, one its shape is (286, 10) – Coder Dec 06 '20 at 20:14
1

If you've found my answer helpful, please consider upvoting or accepting it. It seems that you have an outstanding question about how dimension and distance measurements are related. A large number of dimensions can mean that points tend to become uniformly far apart in a certain sense (see: https://stats.stackexchange.com/questions/99171/why-is-euclidean-distance-not-a-good-metric-in-high-dimensions/99191#99191), but that doesn't necessarily entail that you would observe the phenomenon that you outline in your question. If you wish, you can ask another question about this in particular. – Sycorax Dec 06 '20 at 23:24
As you are knowledgeable, could you please answer my other [question](https://stats.stackexchange.com/questions/499780/a-dataset-with-exactly-the-same-number-of-occurrences-of-each-attribute-value) as well? Thanks. – Coder Dec 07 '20 at 22:50
@Coder If you wish to draw attention to a question, you can place a bounty on it. – Sycorax Dec 08 '20 at 05:11

Why k-nn classifier gives exactly the same accuracy with both Euclidean distance and Manhattan distance algorithm with my data?

1 Answers1