I'm doing a little report about the KNN complexity vs SVM.. I would like to know your opinions.. I built this text according to my perspective searching in papers, websites, ppts etc:
The reason why the KNN classification technique obtained less execution time does not mean that it is computationally efficient, investigations as in [91] indicate the large-scale KNN processing is computationally costly, and requires a large amount of memory for an efficient calculation of memory. However, in some cases KNN is effective [92]. The asymptotic complexity of KNN according to [93] is
O(d)
execution time to compute the distance of a point, isO(nd)
execution time and to compute the distance of all the points isO(nk)
extra time to find the k nearest examples, the computational complexity of the KNN technique isO(nk + nd)
[93]. According to [92] the KNN technique is characterized as a non-parametric and lazy classification method (lazy) and because of this according to [90] KNN is very useful in practice where most real-world data sets do not they follow mathematical theoretical assumptions. On the other hand, the SVM classification technique according to [94] presents a computational complexity ofO(n^3)
execution time. The core of SVM is a quadratic programming problem (QP), which separates the support vectors from the rest of the training data. In fact, the execution time of SVM is of cubic order, which means that in most cases it will require a high execution time, however SVM is a very efficient technique in classifying data of very high dimensions [95].
The reason why I did this, is beacause I have trained (knn and svm) using sklearn library in python.. My dataset was about 750 features, 250 features per class (three classes), I trained only one feature dimension (1-D array). This were the results:
- SVM
Between training process and testing process (0.20%) I got: 0.029801 sg
- KNN
Between training process and testing process (0.20%) - 0.0074096 sg
As we can see K-NN got a shorter execution time ≈ 7 milliseconds and SVM 29.801 milliseconds. Is easy say this but to try to give a teoric justification to it, I did the text of above.
I hope your opinions, thanks so much. Probably I will need to add more information according your opinions :D
**Update I left the code, I'm working with real data this text is not a assumption
(x_train, x_test, y_train, y_test) = train_test_split(data,
labels,
test_size=0.20,
random_state=11)
#get start time
start = time.clock()
#build the model
svm = SVC(kernel='rbf', gamma=0.5)
#train the model
svm.fit(x_train, y_train)
#make test with 20%
y_predicted = svm.predict(x_test)
#get end time
end = time.clock()
runtime = end - start
#get confusión matrix using PyCM library
cm = ConfusionMatrix(actual_vector=y_test, predict_vector=y_predicted)
- [90] A. Navlani, “KNN Classification using Scikit-learn (article) - DataCamp,” 2018. [Online]. Available: https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn. [Accessed: 05-Jul-2019].
- [91] N. Chiluka, A.-M. Kermarrec, and J. Olivares, “The Out-of-core KNN Awakens: The light side of computation force on large datasets.,” Int. Conf. Networked Syst. NETYS, p. 16, 2016.
- [92] G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “KNN Model-Based Approach in Classification,” pp. 986–996, 2010.
- [93] S. Sayad, “K Nearest Neighbors classification.” Department of Computer Science Middlesex College, Ontario, Canada., p. 20, 2010.
- [94] A. Abdiansah and R. Wardoyo, “Time Complexity Analysis of Support Vector Machines (SVM) in LibSVM,” Int. J. Comput. Appl., vol. 128, no. 3, pp. 28–34, 2015.
- [95] L. Argerich, “What makes SVM good method when dealing with high-dimensional data? - Quora,” 2014. [Online]. Available: https://www.quora.com/What-makes-SVM-good-method-when-dealing-with-high-dimensional-data. [Accessed: 12-Jul-2019].
- [96] J. D. Keller, B. Mac Namee, and A. D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics. The Massachusetts Institute of Technology, 2015.