knn asymptotic complexity vs svm

Question

I'm doing a little report about the KNN complexity vs SVM.. I would like to know your opinions.. I built this text according to my perspective searching in papers, websites, ppts etc:

The reason why the KNN classification technique obtained less execution time does not mean that it is computationally efficient, investigations as in [91] indicate the large-scale KNN processing is computationally costly, and requires a large amount of memory for an efficient calculation of memory. However, in some cases KNN is effective [92]. The asymptotic complexity of KNN according to [93] is O(d) execution time to compute the distance of a point, is O(nd) execution time and to compute the distance of all the points is O(nk) extra time to find the k nearest examples, the computational complexity of the KNN technique is O(nk + nd) [93]. According to [92] the KNN technique is characterized as a non-parametric and lazy classification method (lazy) and because of this according to [90] KNN is very useful in practice where most real-world data sets do not they follow mathematical theoretical assumptions. On the other hand, the SVM classification technique according to [94] presents a computational complexity of O(n^3) execution time. The core of SVM is a quadratic programming problem (QP), which separates the support vectors from the rest of the training data. In fact, the execution time of SVM is of cubic order, which means that in most cases it will require a high execution time, however SVM is a very efficient technique in classifying data of very high dimensions [95].

The reason why I did this, is beacause I have trained (knn and svm) using sklearn library in python.. My dataset was about 750 features, 250 features per class (three classes), I trained only one feature dimension (1-D array). This were the results:

SVM

Between training process and testing process (0.20%) I got: 0.029801 sg

KNN

Between training process and testing process (0.20%) - 0.0074096 sg

As we can see K-NN got a shorter execution time ≈ 7 milliseconds and SVM 29.801 milliseconds. Is easy say this but to try to give a teoric justification to it, I did the text of above.

I hope your opinions, thanks so much. Probably I will need to add more information according your opinions :D

**Update I left the code, I'm working with real data this text is not a assumption

(x_train, x_test, y_train, y_test) = train_test_split(data,
                                                     labels,
                                                     test_size=0.20,
                                                     random_state=11)
#get start time
start = time.clock()
#build the model
svm = SVC(kernel='rbf', gamma=0.5)

#train the model
svm.fit(x_train, y_train)

#make test with 20%

y_predicted = svm.predict(x_test)

#get end time
end = time.clock()

runtime = end - start

#get confusión matrix using PyCM library 
cm = ConfusionMatrix(actual_vector=y_test, predict_vector=y_predicted)

[90] A. Navlani, “KNN Classification using Scikit-learn (article) - DataCamp,” 2018. [Online]. Available: https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn. [Accessed: 05-Jul-2019].
[91] N. Chiluka, A.-M. Kermarrec, and J. Olivares, “The Out-of-core KNN Awakens: The light side of computation force on large datasets.,” Int. Conf. Networked Syst. NETYS, p. 16, 2016.
[92] G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “KNN Model-Based Approach in Classification,” pp. 986–996, 2010.
[93] S. Sayad, “K Nearest Neighbors classification.” Department of Computer Science Middlesex College, Ontario, Canada., p. 20, 2010.
[94] A. Abdiansah and R. Wardoyo, “Time Complexity Analysis of Support Vector Machines (SVM) in LibSVM,” Int. J. Comput. Appl., vol. 128, no. 3, pp. 28–34, 2015.
[95] L. Argerich, “What makes SVM good method when dealing with high-dimensional data? - Quora,” 2014. [Online]. Available: https://www.quora.com/What-makes-SVM-good-method-when-dealing-with-high-dimensional-data. [Accessed: 12-Jul-2019].
[96] J. D. Keller, B. Mac Namee, and A. D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics. The Massachusetts Institute of Technology, 2015.

Hi shimao sorry but I would like clear my doubts in this last topic, is free give one opinion, please what do you think? — Freddy Daniel, Jul 13 '19 at 03:18

score 1 · Answer 1 · answered Jul 13 '19 at 02:27

1

Indeed you can't say any more accurate explanation without knowing the details of the complexity. Because the time complexity is asymptotic and it might be some constant factors are neglected in this theoretical complexity. Then, those factors could be large or too small and effect on the CPU running time of the algorithm.

Therefore, you could not say anything more without knowing the more exact time complexity, not the asymptotic one.

answered Jul 13 '19 at 02:27

OmG

1,039
10
13

therefore my text is enought for those previous results?. what you will say if you are working for datascience company? – Freddy Daniel Jul 13 '19 at 03:01
@FreddyDaniel if you want to say practically, you should run some experiment over data and show some statistics about the running time. These complexities do not satisfy a data scientist. – OmG Jul 13 '19 at 10:46
but I trained SVM using sklearn library and I got 0.029801 sg between training process and testing process (inference) I got 100% of precision.. sklearn is using libsvm like solver.. here is: https://scikit-learn.org/stable/modules/svm.html#complexity what do you think? – Freddy Daniel Jul 13 '19 at 18:43
@FreddyDaniel maybe you are using the same data set for training and testing to get 100 percent precision : ) – OmG Jul 13 '19 at 18:57
sorry I forgot left the python code but I used 80% for training and 20% for test (hold-out test set) to prevent the peeking [96].. do you think that that text could give a basic teoric justification? My experiment was done using real data, In got the values of the a* and b* coordinates from the CIELAB color space.. I'm working in fruit color grading.. first I tryed with the Color Moment of the mean of the red channel using RGB color space, however when I used the CIELAB color space I got 100% of accuracy/precision. – Freddy Daniel Jul 13 '19 at 21:48

knn asymptotic complexity vs svm

1 Answers1