I want to know which is the computational cost of the $\texttt{knnsearch}$ algorithm in Matlab. $\texttt{[~, distX] = knnsearch(X,X,'K',N,'Distance','chebychev');}$ where N is length of X.
1 Answers
Consider this post on SO: https://stackoverflow.com/questions/23175491/can-someone-tell-me-about-the-knn-search-algo-that-matlab-uses Where it states that: this algorithm is O(log(n)) rather than O(n^2) as THE OP in this post wrotes its own knn search algorithm which does not performed well. Unfortunately, the contents of the MEX file from matlab are not available for inspection to confirm that the power is O(log(n)). However this post, is a little bit old, but maybe there is now a newer information.
So it should be O(log(n)). But we dont know if this is in the best, worst or average case.
Update
We stick it together:
Update 2: It has to be O(n) * O(n²), but this doesnt change the O(n²*1, see later) as the length of the outer loop is fixed to length of 10 and is not a high n. So the end result is still correct.
1.) your for loop is indeed O(n) Imagine if the knn were at its worst O(n²) we had O(n) * O(n²) which would result in O(n * n²) where values of e.g. n = 10 for your outer for loop would result in O(10 * 1000000). The 10 is no longer of interest. Thus, your program is not so dependent of your for loop, but more on the knn algorithm itself, if you stick to higher values of n.
2.) In matlab it states [these parentheses show my own addition] that:
The default value [of the knn algorithm in matlab] is 'kdtree' when X has 10 or fewer columns, X is not sparse, and the distance metric is a 'kdtree' type. Otherwise, the default value is 'exhaustive'.
As your X is determined by N, you would use an exhaustive search.
Exhaustive search equals Brute Force: https://ijarcce.com/wp-content/uploads/2012/03/IJARCCE9A__a__Arshu__comparison.pdf
The computational cost for brute force is:
Training time complexity: O(1)
Training space complexity: O(1)
Prediction time complexity: O(k * n * d)
Prediction space complexity: O(1)
From: https://towardsdatascience.com/k-nearest-neighbors-computational-complexity-502d2c440d5
Where:
- n equals your X (as we see in the comments: X = randn(N(i),1)),
- k, as you state in your comments, equals n
- and d was 1 as you also stated in the comments. Thus, we got O(n * n * 1) your brute force knn equals O(n²) and with greater for loops the n in the loop won't matter anymore.
If I'm wrong about X, as you stated you have length of N, but this is only in your for loop... and you dont have n² (maybe with higher n to check), you could calculate the computational cost from kdtree instead if your X <=10. That would result in more n complexity.
Training time complexity: O(d * n * log(n))
Training space complexity: O(d * n)
Prediction time complexity: O(k * log(n))
Prediction space complexity: O(1)
Thus the only question! you should answer is, if you really have brute force or kd-type. The for loop has no essential role. If you can figure out your method in matlab or set it directly in your call, than you should know.

- 1,498
- 2
- 14
-
I tried some simulations and I see that the time (cost) increase not like a log(N). – Massimiliano Romana Mar 26 '21 at 16:37
-
What was your time? If it is O(n) than O(log(n)) may be the best case and average and worst may be O(n), also it may dpened on your task: – Patrick Bormann Mar 26 '21 at 17:06
-
also it may depend on your task:https://towardsdatascience.com/k-nearest-neighbors-computational-complexity-502d2c440d5. I dont believe there is so much difference between knn of different libs, you may also look at here: https://stats.stackexchange.com/questions/219655/k-nn-computational-complexity – Patrick Bormann Mar 26 '21 at 17:13
-
In my case, k=n and d=1, so in the two cases we have O(n^2) and O(n+n^2), but my runtime isn't comparable to O(n^2), if someone has Matlab, he/she can try and verify. – Massimiliano Romana Mar 26 '21 at 18:31
-
1The O(n+n²) can be seen as O(n²) if your n is large and in normal O notation we would curtail the (n+) the question is how large was you k which equals n. If it is large enough your influence of n+ will diminish and you should be near the O(n²) you want – Patrick Bormann Mar 27 '21 at 09:53
-
I show the code: N = [10,25,50,100,250,500,1000,2500,5000,10000]'; t = zeros(length(N),1); for i = 1:length(N) X = randn(N(i),1); tic; [~,distX] = knnsearch(X,X,'K',N(i),'Distance','chebychev'); t(i) = toc; end plot(N,t,N,N.*(N)); – Massimiliano Romana Mar 27 '21 at 19:39
-
I showed you in my updated post your computational time, as you have exhaustive search which equals brute force. – Patrick Bormann Mar 28 '21 at 10:19