29

What is the time complexity of the k-NN algorithm with naive search approach (no k-d tree or similars)?

I am interested in its time complexity considering also the hyperparameter k. I have found contradictory answers:

  1. O(nd + kn), where n is the cardinality of the training set and d the dimension of each sample. [1]

  2. O(ndk), where again n is the cardinality of the training set and d the dimension of each sample. [2]

[1] http://www.csd.uwo.ca/courses/CS9840a/Lecture2_knn.pdf (Pag. 18/20)

[2] http://www.cs.haifa.ac.il/~rita/ml_course/lectures/KNN.pdf (Pag. 18/31)

Greenparker
  • 14,131
  • 3
  • 36
  • 80
Daniel López
  • 5,164
  • 2
  • 21
  • 42

1 Answers1

35

Assuming $k$ is fixed (as both of the linked lectures do), then your algorithmic choices will determine whether your computation takes $O(nd+kn)$ runtime or $O(ndk)$ runtime.

First, let's consider a $O(nd+kn)$ runtime algorithm:

  • Initialize $selected_i = 0$ for all observations $i$ in the training set
  • For each training set observation $i$, compute $dist_i$, the distance from the new observation to training set observation $i$
  • For $j=1$ to $k$: Loop through all training set observations, selecting the index $i$ with the smallest $dist_i$ value and for which $selected_i=0$. Select this observation by setting $selected_i=1$.
  • Return the $k$ selected indices

Each distance computation requires $O(d)$ runtime, so the second step requires $O(nd)$ runtime. For each iterate in the third step, we perform $O(n)$ work by looping through the training set observations, so the step overall requires $O(nk)$ work. The first and fourth steps only require $O(n)$ work, so we get a $O(nd+kn)$ runtime.

Now, let's consider a $O(ndk)$ runtime algorithm:

  • Initialize $selected_i = 0$ for all observations $i$ in the training set
  • For $j=1$ to $k$: Loop through all training set observations and compute the distance $d$ between the selected training set observation and the new observation. Select the index $i$ with the smallest $d$ value for which $selected_i=0$. Select this observation by setting $selected_i=1$.
  • Return the $k$ selected indices

For each iterate in the second step, we compute the distance between the new observation and each training set observation, requiring $O(nd)$ work for an iteration and therefore $O(ndk)$ work overall.

The difference between the two algorithms is that the first precomputes and stores the distances (requiring $O(n)$ extra memory), while the second does not. However, given that we already store the entire training set, requiring $O(nd)$ memory, as well as the $selected$ vector, requiring $O(n)$ storage, the storage of the two algorithms is asymptotically the same. As a result, the better asymptotic runtime for $k > 1$ makes the first algorithm more attractive.

It's worth noting that it is possible to obtain an $O(nd)$ runtime using an algorithmic improvement:

  • For each training set observation $i$, compute $dist_i$, the distance from the new observation to training set observation $i$
  • Run the quickselect algorithm to compute the $k^{th}$ smallest distance in $O(n)$ runtime
  • Return all indices no larger than the computed $k^{th}$ smallest distance

This approach takes advantage of the fact that efficient approaches exist to find the $k^{th}$ smallest value in an unsorted array.

josliber
  • 4,097
  • 25
  • 43
  • 1
    Great answer and I especially like the advice towards the use of `quickselect`. – usεr11852 Jun 19 '16 at 20:06
  • One more question: for the third option I believe that the time complexity should be O(nd+k), as you still have to compute the most common label among the k-nearest neighbors to emit a prediction, right? – Daniel López Jun 19 '16 at 21:05
  • @Daniel Since $k \leq n$, $O(nd+k)$ is the same as $O(nd)$. – josliber Jun 19 '16 at 21:07
  • Last time I bother you: trying to determine the computational complexity of a modified version of *k*-NN I am working on, I get the following: *O(nd+nd/p)* Where by definition *n*, *d* and *p* are integers greater than zero. Can I simplify that to *O(nd)*? – Daniel López Jun 19 '16 at 21:23
  • @Daniel Yes, in that case $O(nd)$ works. – josliber Jun 19 '16 at 22:50
  • `quickselect` has a worst-case complexity $O(n^2)$ and an average-case complexity $O(n)$. So shouldn't we use $O(n^2)$ in calculating the KNN complexity? – Lei Huang Dec 08 '19 at 05:08
  • Is it true that in the CS context people tend to use worst-case complexity while in the machine learning context people tend to use average-case complexity? – Lei Huang Dec 08 '19 at 05:15
  • @LeiHuang quickselect has worst case $O(n)$ complexity -- see https://en.wikipedia.org/wiki/Median_of_medians – josliber Dec 08 '19 at 15:35
  • It says $O(n^2)$ here: https://en.m.wikipedia.org/wiki/Quickselect – Lei Huang Dec 08 '19 at 15:57
  • @LeiHuang I'm sure it depends on the version of quickselect used, but for instance http://people.csail.mit.edu/rivest/pubs/BFPRT73.pdf presents a version that is worst-case $O(n)$, since from the abstract we read that at most $5.43n$ comparisons are ever required. – josliber Dec 09 '19 at 01:57
  • Your answer begins with "Assuming is fixed (as both of the linked lectures do), then your algorithmic choices will determine whether your computation takes (+) runtime or () runtime.". But if is a constant, then: (+) = () = (). So I'm not sure what you mean by "Assuming is fixed". – Stef Dec 17 '20 at 12:27
  • 1
    @Stef I guess I just meant that it's not being estimated from data, e.g. via a scree plot. – josliber Dec 17 '20 at 13:43