Julia: Taking stock of how it has been doing

Question

I came across a 2012 question that had a very good discussion about Julia as an alternative to R / Python for various types of Statistical Work.

Here lies the original Question from 2012 about Julia's promise

Unfortunately Julia was very new back then & the toolkits needed for statistical work were somewhat primitive. Bugs were being ironed out. Distributions were difficult to install. Et cetera.

Someone had a very apt comment on that question:

This said, it'll be 5 years before this question could possibly be answered in hindsight. As of right now, Julia lacks the following critical aspects of a statistical programming system that could compete with R for day-to-day users:

That was in 2012. Now that it's 2015 and three years have passed, I was wondering how people think Julia has done?

Is there a richer body of experience with the language itself & the overall Julia ecosystem? I would love to know.

Specifically:

Would you advise any new users of statistical tools to learn Julia over R?
What sort of Statistics use-cases would you advise someone to use Julia in ?
If R is slow at a certain task does it make sense to switch to Julia or Python?

Note: First posted June 14 2015.

I took a look recently and was unimpressed with the depth of their statistics packages. If I'm not mistaken, Python is also interpreted, so will have similar limitations as R. The attraction of Julia as I understood it was the promise of extra speed and better access to parallelization. — DWin, Jun 14 '15 at 06:29
@DWin: Thanks! Native Python is indeed interpreted & slow but my assumption was that the people who actually care about using Python in situations where raw speed matters use different Python optimizations that bring its speed closer to something like Julia which has a JIT compiler. — curious_cat, Jun 14 '15 at 06:34
I think the problem with Julia is that SciPy keeps getting better, and now we also have Torch in the mix. Nobody wants to learn a third (or fourth or fifth) scientific computing language, even if it's fast and has cool function overloading features. — shadowtalker, Jun 14 '15 at 12:41
Julia is a well-designed, nice language, but in my opinion it arrived too little too late. The single-node matrix computation train has long passed. Julia is essentially Fortran 2.0, with several nice features, but as we transition increasingly into cloud computing it has very little to offer over functional languages like Scala, Clojure and even Python to some extent. Had Julia been in its current state 10 years ago, it could have been an enormous success. — Marc Claesen, Aug 27 '15 at 06:58
Python and Rcpp are really dynamically developing, R gains more and more attention (R Consortium, Microsoft etc.) so it seems to be tough for Julia to catch up... — Tim, Aug 27 '15 at 07:28
@curious_cat regarding Python, the speed of the language itself is usually irrelevant. The key thing to realize is that performance issues are usually very localized and make up a very tiny piece of a program (often a single loop). Popular Python libraries effectively delegate all performance critical functionality to C/C++ back-ends (e.g. scikit-learn, numpy). In most applications, Python is just syntactically elegant glue. — Marc Claesen, Aug 27 '15 at 09:12
I didn't see the business case for Julia, and still don't. It seemed like a redundant attempt by programmers to re-build something that already exists. — Aksakal, Aug 27 '15 at 14:50

score 16 · Answer 1 · answered Jan 28 '16 at 08:21

I have switched to Julia, and here are my pragmatic reasons:

It does glue code really well. I have a lot of legacy code in MATLAB, and MATLAB.jl took 5 minutes to install, works perfectly, and has a succinct syntax that makes it natural to use MATLAB functions. Julia also has the same for R, Python, C, Fortran, and many other languages.
Julia does parallelism really well. I'm not just talking about multiple processor (shared memory) parallelism, but also multi-node parallelism. I have access to a HPC nodes that aren't used too often because each is pretty slow, so I decided to give Julia a try. I added @parallel to a loop, started it by telling it the machine file, and bam it used all 5 nodes. Try doing that in R/Python. In MPI that would take awhile to get it to work (and that's with knowing what you're doing), not a few minutes the first time you try it!
Julia's vectorization is fast (in many cases faster than any other higher level language), and its devectorized code is almost C fast. So if you write scientific algorithms, usually you first write it in MATLAB and then re-write it in C. Julia lets you write it once, then give it compiler codes and 5 minutes later it's fast. Even if you don't, this means you just write the code whatever way feels natural and it will run well. In R/Python, you sometimes have to think pretty hard to get a good vectorized version (that can be tough to understand later).
The metaprogramming is great. Think of the number of times you've been like "I wish I could ______ in the language". Write a macro for it. Usually someone already has.
Everything is on Github. The source code. The packages. Super easy to read the code, report issues to the developers, talk to them to find out how to do something, or even improve packages yourself.
They have some really good libraries. For statistics, you'd probably be interested in their optimization packages (JuliaOpt is a group which manages them). The numeric packages are already top notch and only improving.

That said, I still really love Rstudio, but the new Juno on Atom is really nice. When it's no longer in heavy development and is stable, I can see it as better than Rstudio because of the ease of plugins (example: it has a good plugin for adapting to hidpi screens). So I think Julia is a good language to learn now. It has worked out well for me so far. YMMV.

Do you mind updating this answer since more than 3 years have passed? — Bayequentist, Jun 15 '19 at 05:27
I gave an updated response here: https://scicomp.stackexchange.com/questions/10922/how-mature-is-the-julia-scientific-computing-language-project/32696#32696 . Maybe that should get copied over. — Chris Rackauckas, Jun 15 '19 at 05:29

ffriend · Answer 2 · 2018-01-02T22:03:19.620

I think "learn X over Y" isn't the right way to formulate the question. In fact, you can learn (at least basics of) both and decide on the right tool depending on concrete task at hand. And since Julia inherited most of its syntax and concepts from other languages, it shoud be really easy to grasp it (as well as Python, though I'm not sure the same may be said about R).

So which language is better suited for what task? Based on my experience with these tools I would rate them as follows:

For pure statistical research that can be done with REPL and a couple of scripts, R seems to be the perfect choice. It is specifically designed for statistics, has longest history of tools and probably largest set of statistical libraries.
If you want to integrate statistics (or, for example, machine learning) into production system, Python seems like much better alternative: as a general-purpose programming language it has an awesome web stack, bindings to most APIs and libraries literaly for everything, from scrapping the web to creating 3D games.
High-performance algorithms are much easier to write in Julia. If you only need to use or combine existing libraries like SciKit Learn or e1071 backed by C/C++, you will be fine with Python and R. But when it comes to fast backend itself, Julia becomes real time-saver: it's much faster than Python or R and doesn't require additional knowledge of C/C++. As an example, Mocha.jl reimplements in pure Julia deep learning framework Caffe, originally written in C++ with a wrapper in Python.
Also don't forget that some libraries are available only in some languages. E.g. only Python has mature ecosystem for computer vision, some shape-matching and trasnformation algorithms are implemented only in Julia and I've heard of some unique packages for statistics in medicine in R.

I would say that most people should try to choose one and stay mostly with that---for me at least, using multiple languages I end up mixing them, loosing a lot of time that way ... — kjetil b halvorsen, Aug 27 '15 at 17:45
A paradoxal issue with writing high-performance algorithms is that even though they can easier to write in higher level language like R or Julia, by the time you're actually writing high-performance algorithms, you probably like using something like C++ anyways. Or maybe that's just me. — Cliff AB, Aug 27 '15 at 20:15

score 3 · Answer 3 · edited Jun 11 '20 at 14:32

(b) What sort of Statistics use-cases would you advise someone to use Julia in

(c) If R is slow at a certain task does it make sense to switch to Julia or Python?

High dimensional and compute intensive problems.

Multiprocessing. Julia's single node parallel capabilities (@spawnat) are much more convenient than those in python. E.g. in python you cannot use a map reduce multiprocessing pool on the REPL and every function you wish to parallelise requires lots of boilerplate.
Cluster computing. Julia's ClusterManagers package lets you use a compute cluster almost as you would a single machine with several cores. [I've been playing with making this feel more like scripting in ClusterUtils ]
Shared Memory. Julia's SharedArray objects are superior to the equivalent shared memory objects in python.
Speed. My Julia implementation is (single-machine) faster than my R implementation at random number generation, and at linear algebra (supports multithreaded BLAS).
Interoperability. Julia's PyCall module gives you access the python ecosystem without wrappers - e.g. I use this for pylab. There's something similar for R, but I've not tried it. There is also ccall for C/Fortran libraries.
GPU. Julia's CUDA wrappers are far more developed than those in python (Rs were nearly non-existent when I checked). I suspect this will continue to be the case because of how much easier it is to call external libraries in Julia than in python.
Ecosystem. The Pkg module uses github as a backend. I believe this will have a big impact on the longrun maintainability of Julia modules as it makes it much more straightforward to offer patches or for owners to pass on responsibility.
$\sigma$ is a valid variable name ;)

Writing fast code for large problems will increasingly be dependent on parallel computing. Python is inherently parallel unfriendly (GIL), and native multiprocessing in R is nonexistent AFAIK. Julia doesn't require you to drop down to C to write performant code, while retaining much of the feel of python/R/Matlab.

The main downside to Julia coming from python/R is lack of documentation outside of the core functionality. python is very mature, and what you can't find in the docs is usually on stackoverflow. R's documentation system is pretty good in comparison.

(a) Would you advise any new users of statistical tools to learn Julia over R?

Yes, if you fit the use cases in part (b). If your use case involves lots of heterogeneous work

Julia: Taking stock of how it has been doing

3 Answers3