19

To evaluate the performance a new classifier algorithm, I'm trying to compare the accuracy and the complexity (big-O in training and classifying). From Machine Learning: a review I get a complete supervised classifiers list, also a accuracy table between the algorithms, and 44 test problems from UCI data repositoy. However, I can't find a review, paper or web-site with the big-O for common classifiers like:

  • C4.5
  • RIPPER (I think this might not be possible, but who knows)
  • ANN with Back Propagation
  • Naive Bayesian
  • K-NN
  • SVM

If anyone has any expression for these classifiers, it will be very useful, thank you.

Memming
  • 1,570
  • 12
  • 23
Dinl
  • 193
  • 1
  • 1
  • 5
  • 2
    You may be interested in the following article : https://www.thekerneltrip.com/machine/learning/computational-complexity-learning-algorithms/ Full disclaimer, it is my blog :) – RUser4512 Apr 13 '18 at 15:14
  • Care to trace back the locations where the now dead-links of the question pointed at? – matt Apr 19 '18 at 08:43
  • @RUser4512 really great blog deliberation! have you considered adding space complexity as well? – matt Apr 19 '18 at 08:46
  • 1
    @matt Thank you :) yes, but probably in another article, there are many things to say about this as well! – RUser4512 Apr 19 '18 at 09:13

1 Answers1

14

Let $N$ = number of training examples, $d$ = dimensionality of the features and $c$ = number of classes.

Then training has complexities:

  1. Naive Bayes is $O(Nd)$, all it needs to do is computing the frequency of every feature value $d_i$ for each class.
  2. $k$-NN is in $\mathcal{O}(1)$ (some people even say it is non-existent, but space complexity of training is $\mathcal{O}(Nd)$ since you need to store the data which also takes time).
  3. Nonlinear non-approximate SVM is $O(N^2)$ or $O(N^3)$ depending on the kernel. You can get a $O(N^3)$ down to $O(N^{2.3})$ with some tricks.
  4. Approximate SVM is $O(NR)$ where R is number of iterations.

Testing complexities:

  1. Naive Bayes is in $\mathcal{O}(cd)$ since you have to retrieve $d$ feature values for each of the $c$ classes.
  2. $k$-NN is in $\mathcal{O}(Nd)$ since you have to compare the test point to every data point in your database.

Source: "Core Vector Machines: Fast SVM Training on Very Large Data Sets" - http://machinelearning.wustl.edu/mlpapers/paper_files/TsangKC05.pdf

Sorry I don't know about the others.

samthebest
  • 835
  • 5
  • 11
  • 6
    Not entirely right: the training complexity of kernel SVM is between $O(n^2)$ and $O(n^3)$ *depending on C*; See e.g. [Support Vector Machine Solvers by Bottou and Lin](http://140.112.30.28/~cjlin/papers/bottou_lin.pdf) (section 4.2). – Marc Claesen May 10 '14 at 08:17
  • @MarcClaesen The link doesn't work anymore and it's not on the wayback machine. Is this the same paper: https://leon.bottou.org/publications/pdf/lin-2006.pdf ? – wordsforthewise Feb 04 '19 at 19:11