6

I am trying to write my own ML library. For speed reasons I started out writing things in C using BLAS, but then I learned that NumPy and Theano also use BLAS. I am wondering if there are huge speed differences between implementations of ML algorithms in C/Python/Matlab/Octave.

Does anybody have some experience or can provide some data for the comparison? If there is no really good reason to write in pure C, I would rather not.

gui11aume
  • 13,383
  • 2
  • 44
  • 89
sjm.majewski
  • 3,548
  • 1
  • 19
  • 27
  • 1
    See also [julia](http://julialang.org/) and [does-julia-have-any-hope-of-sticking-in-the-statistical-community](http://stats.stackexchange.com/questions/25672/does-julia-have-any-hope-of-sticking-in-the-statistical-community). (Imho, visualization, tutorials, sw components, users are more important than raw speed: different topic.) – denis Sep 10 '13 at 12:58

1 Answers1

8

It depends heavily on the algorithm.

There are several things for which writing code in C won't give you any benefit: matrix operations (dot products, element wise multiplications/applications of functions like sin or so, matrix inversions, QR decompositions, ...) because BLAS or LAPACK is called. This makes it possible to implement lots of algorithms easily.

You will have a tough time to match C's performance though when you need to do stuff like trees or huge graphs, which is the case for e.g. decision trees, KNN or sophisticated graphical models with lots of structure.

Some random thoughts:

  • machine learning algorithms are notoriously hard to debug without a reference implementation; C is much harder to debug than Python.
  • you will get to 90% of the performance of C in some cases with Python, but if you really need to be fast, you will have to stick with C
  • Python is growing quite a big eco system for machine learning with theano and sklearn, it's a good time to join.
bayerj
  • 12,735
  • 3
  • 35
  • 56
  • 1
    Thank you, that was helpful. I would still love to see some benchmarks though. – sjm.majewski Jul 29 '12 at 10:59
  • This [page](http://www.scipy.org/PerformancePython/#head-c4d7537c426e2a95556a4628ffbe84d4b2a9de4c) discusses some approaches to speeding up python and also provides benchmarks. It's not ML, but it should give you a rough idea of what's possible. – alto Jul 29 '12 at 13:53
  • See also [scipy-lectures on optimizing](http://scipy-lectures.github.io/advanced/optimizing) – denis Sep 10 '13 at 10:10