1

I am working on the breast cancer dataset in Weka where the class variable is the target of classification. I ran the nearest neighbour classifier (lazy/IBk) with Euclidean distance with different values of k-nn for the k nearest neighbours, from 1 to 10 (with all the other features selected) and I got the following results:

enter image description here

Then I ran the same classifier with the same selected features but this time with Manhattan distance and with k from 1 to 10 and I got exactly the same result as before:

enter image description here

I wonder why the two result tables are literally the same?

Coder
  • 285
  • 1
  • 10

1 Answers1

3

Either or both of these conditions could exist:

  1. They're the same because nearest neighbors ultimately only depends on what observations are nearest. In other words, each observation has the same ranked distances from the other observations, for these choices of $k$. (The ranks of the $k+1$ nearest observation, and all more distant observations, has not effect on the $k$-NN classification.)
  2. The ranks could be different, but the distance to the nearest observation of like class (positive/negative) is not changed. So if changing the metric changed the ranked distances, this effect will be suppressed if the nearest neighbor in the Manhattan case is the same class as the nearest neighbor in the Euclidean case, for all cases, for all $k \in \{1,2,3,\dots,10\}$.

(Or there's a coding mistake somewhere. Coding mistakes sometimes happen.)

Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • Thanks for your answer. Besides what you have explained, can the dimensionality of data be one reason as well? The dataset on which I am working is not a big, one its shape is (286, 10) – Coder Dec 06 '20 at 20:14
  • 1
    If you've found my answer helpful, please consider upvoting or accepting it. It seems that you have an outstanding question about how dimension and distance measurements are related. A large number of dimensions can mean that points tend to become uniformly far apart in a certain sense (see: https://stats.stackexchange.com/questions/99171/why-is-euclidean-distance-not-a-good-metric-in-high-dimensions/99191#99191), but that doesn't necessarily entail that you would observe the phenomenon that you outline in your question. If you wish, you can ask another question about this in particular. – Sycorax Dec 06 '20 at 23:24
  • As you are knowledgeable, could you please answer my other [question](https://stats.stackexchange.com/questions/499780/a-dataset-with-exactly-the-same-number-of-occurrences-of-each-attribute-value) as well? Thanks. – Coder Dec 07 '20 at 22:50
  • @Coder If you wish to draw attention to a question, you can place a bounty on it. – Sycorax Dec 08 '20 at 05:11