167

I recently read a post from R-Bloggers, that linked to this blog post from John Myles White about a new language called Julia. Julia takes advantage of a just-in-time compiler that gives it wicked fast run times and puts it on the same order of magnitude of speed as C/C++ (the same order, not equally fast). Furthermore, it uses the orthodox looping mechanisms that those of us who started programming on traditional languages are familiar with, instead of R's apply statements and vector operations.

R is not going away by any means, even with such awesome timings from Julia. It has extensive support in industry, and numerous wonderful packages to do just about anything.

My interests are Bayesian in nature, where vectorizing is often not possible. Certainly serial tasks must be done using loops and involve heavy computation at each iteration. R can be very slow at these serial looping tasks, and C/++ is not a walk in the park to write. Julia seems like a great alternative to writing in C/++, but it's in its infancy, and lacks a lot of the functionality I love about R. It would only make sense to learn Julia as a computational statistics workbench if it garners enough support from the statistics community and people start writing useful packages for it.

My questions follow:

  1. What features does Julia need to have in order to have the allure that made R the de facto language of statistics?

  2. What are the advantages and disadvantages of learning Julia to do computationally-heavy tasks, versus learning a low-level language like C/++?

Bayequentist
  • 330
  • 1
  • 2
  • 15
Christopher Aden
  • 1,775
  • 4
  • 24
  • 43
  • 7
    How is Julia better than Incanter (http://incanter.org/) and other similar projects? – Wayne Apr 02 '12 at 14:39
  • 25
    Re procedural constructs (e.g. looping): that sounds like a giant step backwards. We are at the cusp of a change from single and small-CPU platforms to massively parallel platforms. As this evolution occurs over the next decade or so, the easily and automatically parallelizable functional style of coding will reap huge advantages over procedural code. Many other considerations intervene in one's choice of a statistical platform, of course, but this one is worth bearing in mind as a long term strategy. – whuber Apr 03 '12 at 16:14
  • @whuber: I've trimmed off the first ``question''. I'm interested in an informed speculation. Thoughtful responses like the Harlan and NeilG's. My familiarity with Julia is less than a week old, so their responses are able to shed some light on the pros/cons of both Julia and R. I'm interested in further discussion on the topic, so if you could recommend ways to make it more congruent with the CW format, I'd be grateful. – Christopher Aden Apr 03 '12 at 18:29
  • 12
    Christopher, a good approach is to frame questions in a manner designed to solicit reasons and evidence. E.g., instead of "Does Julia have the necessary allure...," try something like "What elements of *Julia* might give it a chance of gaining traction and why"; instead of "Is it worth learning," ask "Why might Julia be worth learning now? What are its potential advantages?" You could further refine that question by specifying what kinds of uses of *Julia* you may be interested in, such as software development, solving one-off problems, biostatistics, data mining, etc. – whuber Apr 03 '12 at 18:41
  • 1
    @Whuber: I appreciate the suggestions and have implemented them. Thank you! – Christopher Aden Apr 03 '12 at 19:02
  • 1
    Christopher, well done! (And +1 for the question.) And now I think @naught could make a good case for this *not* to be CW, but I still agree with chl that multiple good answers are likely to emerge--that has become abundantly obvious--and so prefer to maintain the CW status of this thread. – whuber Apr 03 '12 at 19:16
  • 1
    @whuber: the question seems to have changed substantially. I don't have a very good case against CW anymore :) – naught101 Apr 03 '12 at 21:56
  • My fears about CW giving me low-quality answers are unfounded. I'm quite satisfied with the responses I've received so far. – Christopher Aden Apr 04 '12 at 00:24
  • 1
    the biggest hurdle a new design faces is in the old design somehow sustaining (more or less efficiently) another adaptation away from its original intend. The biggest competitor to Julia ain't R. It's more about Hp-computing adaptation to base-R. Here I'm thinking of the byte-code initiative such as the "compiler" package. compiler got to a slow start, but eventually it'll catch up with Javascript's speed for non-vectorized operations, and do that way before Julia will have a code depot as deep as R. The result will probably push R further away from elegance.... – user603 Apr 04 '12 at 15:28
  • 1
    @whuber, could you point me to more information regarding the »easily and automatically parallelizable functional style of coding «? As someone who has struggled with CUDA and MPI, that sounds very interesting! – trolle3000 Apr 19 '13 at 19:16
  • @trolle3000 Good, practical references include books on `R` (such as many of those written by one of its founding fathers, John Chambers) and *Mathematica* (which supports several programming paradigms but favors functional programming and offers automatic parallelization in a number of ways through its `Parallel*` commands and, more recently, CUDA support). See http://mathematica.stackexchange.com/questions/1883 for example. – whuber Apr 19 '13 at 20:12
  • @trolle3000: Automatically parallelizable is one of the nice features that comes out of some languages. Matlab, for one, has a lot of their functions parallelized in the Statistics Toolbox. While a lot of parallelization won't be as difficult (explicit) as CUDA, it'll still require some effort. See: http://cran.r-project.org/web/packages/doMC/index.html and http://docs.julialang.org/en/latest/manual/parallel-computing/ – Christopher Aden Apr 19 '13 at 20:50
  • 1
    @ChristopherAden, @whuber; parallelization doesn't just automagically fall out of 'some languages'...! The MATLAB `parfor` command, for example, will only parallelize an already embarrasingly parallel problem, just like the `#omp pragma parfor` will do in C. My real question is, what's the inherent advantage in functional programming with respects to parallelization? – trolle3000 Apr 21 '13 at 03:24
  • That's obviously `#pragma omp parallel for` – trolle3000 Apr 21 '13 at 03:31
  • 2
    @trolle3000 I don't think anybody is claiming that parallelization is so automatic. However, when (if) you have written a functional version of a program, you have already undertaken much of the effort needed to parallelize it, which is why applications like *Mathematica* can automate the parallelization, often quite effectively. If instead you have coded an algorithm in a procedural manner, it will usually be much more difficult to parallelize it. – whuber Apr 21 '13 at 20:06
  • 1
    I am astonished that the current discussion regarding 'functional programming' seems to omit *real* functional languages like Haskell, Clojure and Scala. Also note that imperative languages are *not* necessarily worse than FP at concurrency (take Go for instance). – Marc Claesen Jun 22 '14 at 13:20
  • A great new MCMC package just came out for Julia! You guys should check it out :) https://github.com/brian-j-smith/Mamba.jl – bdeonovic Jun 22 '14 at 13:43
  • 1
    "the orthodox looping mechanisms that those of us who started programming on traditional languages are familiar with, instead of R's apply statements and vector operations." - I have a problem with this statement. You probably mean programmers who started and got stuck in imperative programming. What are *traditional programming languages* anyway? R's apply statements are in line with very traditional functional language paradigm and with generic programming style like in ML. Both approaches are at least 40 years old. I first saw C++ STL in mid 1990s. "apply" is as traditonal as it gets in IT – Aksakal Oct 12 '15 at 17:31
  • 1
    @whuber I just thought you might be interested to know that the assertion that Julia code needs to be de-vectorized to be fast is a myth (albeit a strangely popular one). I answered a StackOverflow question about exactly this point [here](https://stackoverflow.com/questions/34780503/why-devectorization-in-julia-is-encouraged/34782346#34782346). The short summary is: Julia is fast and supportive of parallel constructs, whether you use a vectorized or de-vectorized style of coding. – Colin T Bowers Jan 16 '18 at 05:11
  • 1
    Thank you, @Colin. I believe you mischaracterize my comments: I don't think I said or implied vectorization is required for code to be "fast." Vectorization is helpful for scalability and extreme speed and therefore is an important consideration for code that will be applied to large problems. What is interesting is your claim that Julia vectorizes code that is not explicitly written in a vectorized form. If that's the case, we certainly can expect many algorithms implemented in Julia to be fast and scalable. – whuber Jan 16 '18 at 15:55
  • @whuber Apologies, poor language choice by me. I conflated "fast" and "scalability". However, I should clarify what I am claiming. I am not claiming that Julia *always* vectorizes non-vectorized code - I'm not aware of any compiler smart enough to do that. What I am claiming is difficult to summarize in a comment, but here goes: For non-vectorized Julia code, performance is close to C or Fortran. For vectorized Julia code, performance is close to R/Matlab [and better in some ways](https://julialang.org/blog/2017/01/moredots), and support for parallelism is similarly close to or better. – Colin T Bowers Jan 16 '18 at 22:54
  • 1
    @Colin Thanks. That is exactly in the sense I understood your comment--I wouldn't expect an interpreter or compiler to recognize all vectorization opportunities. – whuber Jan 16 '18 at 23:35
  • Would somebody like to add an explanation how 'apply' is that much different from 'foreach' except for mostly the syntax. – Sextus Empiricus Jun 25 '19 at 09:55

21 Answers21

99

I think the key will be whether or not libraries start being developed for Julia. It's all well and good to see toy examples (even if they are complicated toys) showing that Julia blows R out of the water at tasks R is bad at.

But poorly done loops and hand coded algorithms are not why many of the people I know who use R use R. They use it because for nearly any statistical task under the sun, someone has written R code for it. R is both a programming language and a statistics package - at present Julia is only the former.

I think its possible to get there, but there are much more established languages (Python) that still struggle with being usable statistical toolkits.

Fomite
  • 21,264
  • 10
  • 78
  • 137
  • Have you actually looked at the benchmark code (or the benchmarks) to know that the R methods are poorly written? I am trying to find it myself to see how the various languages were used... – Josh Hemann Apr 04 '12 at 14:01
  • 11
    @JoshHemann I've looked at enough to know that across the board R is "slow-ish". It doesn't necessarily lose every time, and it does on occasion blow Python out of the water, but in all of those cases the "who wins" ribbon seems to go to which Python or R programmer actually wrote most of their stuff in C. – Fomite Apr 04 '12 at 17:02
  • 5
    The benchmark code is *terrible*. 2000x speed gains are possible for their R examples. See http://stackoverflow.com/questions/9968578/speeding-up-julias-poorly-written-r-examples , especially the comments. – Ari B. Friedman May 15 '12 at 13:44
  • 12
    You're right, @gsk. E.g., `pisum` (at https://github.com/JuliaLang/julia/blob/master/test/perf/perf.R) takes 7.76 seconds while a simple rewrite using idiomatic R (`replicate(500, sum((1 / (10000:1))^2))[500]`) takes 0.137 seconds, more than a fifty-fold speedup. – whuber May 15 '12 at 14:21
  • @whuber Exactly. And compiling probably buys you another "x" or two. – Ari B. Friedman May 15 '12 at 17:52
  • 2
    One reason why R took off was its compatibility to S-PLUS. People were able to use a lot of old code. Old heavily used code has fewer bugs. With new things like Julia, which are not compatible with old code, you need a "killer app" situation: something that justifies all the trouble of moving to a new platform. It's similar to Google's new language Go - nice try, but why would I learn it? – Aksakal Oct 12 '15 at 17:37
  • Julia has RCall and PyCall to call Python and R libraries without syntactic fuss. In fact, there's are so robust that people have written wrappers around many of the common libraries that people are used to, so the whole R core library is wrapped for use via Rmath.jl (and was the standard RNG during early Julia), and many people use PyPlot which uses matplotlib. There are `ccall` functions so you can call C, and the same for Fortran. Packages are wrapped for these as well. So Julia solved the library problem by making it easy to use just about any language's libraries! – Chris Rackauckas Jul 22 '16 at 05:21
58

I agree with a lot of the other comments. "Hope"? Sure. I think Julia has learned a lot from what R and Python/NumPy/Pandas and other systems have done right and wrong over the years. If I were smarter than I am, and wanted to write a new programming language that would be the substrate for a statistical development environment in the future, it would look very much like Julia.

This said, it'll be 5 years before this question could possibly be answered in hindsight. As of right now, Julia lacks the following critical aspects of a statistical programming system that could compete with R for day-to-day users:

(list updated over time...)

  • optionally-ordered factor types
  • most statistical tests and statistical models
  • literate programming/reproduce-able analysis support
  • R-class, or even Matlab-class plotting

To compete with R, Julia and add-on stats packages will need to be clean enough and complete enough that smart non-programmers, say grad students in the social sciences, could reasonably use it. There's a heck of a lot of work to get there. Maybe it'll happen, maybe it'll fizzle, maybe something else (R 3.0?) will supercede it.

Update:

Julia now supports DataFrames with missing data/NAs, modules/namespaces, formula types and model.matrix infrastructure, plotting (sorta), database support (but not to DataFrames yet), and passing arguments by keywords. There is also now an IDE (Julia Studio), Windows support, some statistical tests, and some date/time support.

Harlan
  • 772
  • 7
  • 21
  • `literate programming/reproduce-able analysis support` -> see [IJulia](https://github.com/JuliaLang/IJulia.jl). – Piotr Migdal Jan 27 '15 at 22:32
  • 1
    Add iJulia kernel for the iPython/Jupyter notebook ecosystem. – thecity2 May 15 '15 at 20:07
  • 2
    Julia Studio is being phased out, and Juno is now the IDE – Antony Jun 29 '15 at 12:47
  • 4
    2.5 years after this answer was first posted, two-thirds of the items on the list of "must haves" are now implemented. I think that's the best evidence you could find that Julia has real promise. – senderle Dec 11 '15 at 12:43
  • 5 years must have passed. Are we there yet, @Harlan? – StasK Nov 16 '17 at 03:31
  • @stask Hah, good question! Plotting, statistical packages, etc., are in pretty good shape. The weakest place is actually DataFrames themselves, which have been undergoing rapid change to support a missing-data approach that's type-stable and fast. Sounds like that'll be finalized in the next few weeks, and Julia 0.7 (aka 1.0 Beta) should be released in the next couple months. I'd definitely re-visit Julia after the 0.7 release and the DataFrames updates, if you haven't used it in a while! – Harlan Nov 16 '17 at 15:00
  • Also for statistical reproduce-able analysis Weave.jl is worth a look. :) – Dr. Mike Jan 10 '18 at 23:11
36

For me, one very important thing for a data analysis language is to have query/relational algebra functionality with reasonable defaults and interactively-oriented design, and ideally this should be a built-in of the language. IMO, no FOSS language that I've used does this effectively, not even R.

data.frame is very clunky to work with interactively - for example, it prints the whole data structure on invocation, the \$ syntax is hard to work programatically with, querying requires redundant self reference (i.e., DF[DF$x < 10]), joins and aggregation are awkward. Data.table solves most of these annoyances, but as it is not part of the core implementation, most R code does not make use of its facilities.

Pandas in python suffers from the same faults.

These gripes may seem nitpicky, but these faults accumulate and in the end are significant in aggregate as they end up costing a lot of time.

I believe if Julia is to succeed as a data analysis environment, effort must be devoted to implementing SQL type operators (without the baggage of SQL syntax) on a user friendly table data type.

Yike Lu
  • 111
  • 2
  • 3
  • 1
    +1--An interesting point, thoughtfully explained. Welcome to our community! – whuber Jun 06 '12 at 13:03
  • 4
    To be nit-picky, large Pandas DataFrames actually don't print out all of their contents when invoked, as happens in R. They switch to displaying column headers along with a count of null/non-null values. Also, while I agree the syntax isn't ideal, scoping issues make it hard to eliminate the self-reference for comprehension-style filtering. It is wordier, but it's also resistant to namespace collisions if a DataFrame has extra columns at runtime you didn't expect. – goodside Oct 13 '12 at 17:57
30

I can sign under what Dirk and EpiGrad said; yet there is one more thing that makes R an unique lang in its niche -- data-oriented type system.

R's was especially designed for handling data, that's why it is vector-centered and has stuff like data.frames, factors, NAs and attributes.
Julia's types are on the other hand numerical-performance-oriented, thus we have scalars, well defined storage modes, unions and structs.

This may look benign, but everyone that has ever try to do stats with MATLAB knows that it really hurts.

So, at least for me, Julia can't offer anything which I cannot fix with a few-line C chunk and kills a lot of really useful expressiveness.

  • 4
    (+1) Good point. Some further thoughts: The lack of `data.frame`-like facilities in Python has long bothered me, but now [Pandas](http://pandas.pydata.org/) seems to have resolve this issue. Formula are among some of the planned extensions of [statsmodels](https://github.com/statsmodels/statsmodels) (well, we know that sometimes it's better to avoid the formula interface in R). There's a [data.frame proposal](http://groups.google.com/group/julia-dev/browse_thread/thread/acefe005647e5ac6) for Julia (pretty quick compared to Python!), (...) – chl Apr 02 '12 at 16:55
  • (Con't) and [Doug Bates](http://dmbates.blogspot.com/) has started playing with Julia--as well as [Shane](http://www.statalgo.com/2012/03/24/statistics-with-julia/), [John Myles White](http://www.johnmyleswhite.com/notebook/2012/03/31/julia-i-love-you/), or [Vince Buffalo](http://vincebuffalo.org/2012/03/07/thoughts-on-julia.html)--which certainly reflects the interest of the statistical, ML and bioinformatics communities. So, let's wait and see, as @Dirk said. – chl Apr 02 '12 at 16:55
  • 5
    I think @mbq also has a point about C. If I need speed on the same order of magnitude as C/C++...I can use C/C++ with R. – Fomite Apr 02 '12 at 19:14
  • 4
    @EpiGrad, yes, you can write C/C++ and interface cleanly with R. But that's a weakness, not a strength of the language. With Julia, end users will never need to write C to get speed. – Harlan Apr 03 '12 at 14:07
  • One of the interesting things about Julia is that ALL of the type system, except for hardware-level blocks of bits and floating-points, is implemented in Julia itself. That means that you could write a parallel type system for "Data" that looks much more R-like, including NA support. – Harlan Apr 03 '12 at 14:08
  • 2
    @Harlan It's only a weakness if you already know both Julia and C. I'd assert time spent in C < time spent learning a new language *and* reimplementing everything from scratch. – Fomite Apr 03 '12 at 16:52
  • 2
    @EpiGrad, right. There are probably millions of scientists and analysts who know zero C and shouldn't need to learn it to do their jobs quickly. If they need to learn to program in one language, probably poorly, there's a lot to be said for a language that's designed for _their use cases_ rather than for system-level development. – Harlan Apr 03 '12 at 17:10
  • 10
    @Harlan And to be blunt, those people aren't going to be rewriting their stuff in Julia. R as a statistics package, not a programming language *is their use case*. – Fomite Apr 03 '12 at 17:22
27

I can see Julia replacing Matlab, which would be a huge service for humanity.

To replace R, you'd need to consider all of the things that Neil G, Harlan, and others have mentioned, plus one big factor that I don't believe has been addressed: easy installation of the application and its libraries.

Right now, you can download a binary of R for Mac, Windows, or Linux. It works out of the box with a large selection of statistical methods. If you want to download a package, it's a simple command or mouse click. It just works.

I went to download Julia and it's not simple. Even if you download the binary, you have to have gfortran installed in order to get the proper libraries. I downloaded the source and tried to make and it failed with no really useful message. I have an undergraduate and a graduate degree in computer science, so I could poke around and get it to work if I was so inclined. (I'm not.) Will Joe Statistician do that?

R not only has a huge selection of packages, it has a fairly sophisticated system that makes binaries of the application and almost all packages, automatically. If, for some reason, you need to compile a package from source, that's not really any more difficult (as long as you have an appropriate compiler, etc, installed on your system). You can't ignore this infrastructure, do everything via github, and expect wide adoption.

EDIT: I wanted to fool around with Julia -- it looks exciting. Two problems:

1) When I tried installing additional packages (forget what they're called in Julia), it failed with obscure errors. Evidently my Mac doesn't have a make-like tool that they expected. Not only does it fail, but it leaves stuff lying around that I have to manually delete or other installs will fail.

2) They force certain spacing in a line of code. I don't have the details in front of me, but it has to do with macros and not having a space between the macro and the parenthesis opening its arguments. That kind of restriction really bugs me, since I've developed my code formatting over many years and languages and I do actually put a space between a function/macro name and the opening parenthesis. Some code formatting restrictions I understand, but whitespace within a line?

Wayne
  • 19,981
  • 4
  • 50
  • 99
  • 5
    Julia's still VERY much in its infancy. I'm no historian, but I'd bet that clean binaries of R didn't come out in the first few months, either. Your point about the distribution system is something I haven't seen mentioned much thus far. Then again, I would also wager that CRAN did not sprout up the same time as R. A "CJAN" would definitely be nice for large-scale adoption. – Christopher Aden Apr 03 '12 at 18:57
  • 7
    You might be interested then to know, @Christopher, that R is really an independently developed clone of a package (S, then S-Plus) that had been a (mild) commercial success and was under development ten years previously. That gave it a significant head start that *Julia* (and most other such efforts) never have. – whuber Apr 03 '12 at 19:19
  • 3
    @ChristopherAden: I agree that Julia is yet young. But I would strenuously disagree that "a 'CJAN' would definitely be nice for large-scale adoption": it's an absolute necessity. The only tools I can think of that don't have a CRAN-like infrastructure are highly specialized -- like JAGS. But Julia, like R, is general purpose. – Wayne Apr 04 '12 at 00:46
  • 10
    The day Open Source Language will replace MATLAB will be the best day to the engineering world. – Royi Aug 02 '13 at 13:07
  • 9
    "I can see Julia replacing Matlab, which would be a huge service for humanity." I couldn't agree more. – davidav Oct 03 '13 at 11:16
24

The Julia language is pretty new; it's time in the spot light can be measured in weeks (even though its development time can of course be measured in years). Now those weeks in the spot light were very exciting weeks---see for example the recent talk at Stanford where "it had just started"---but what you ask for in terms of broader infrastructure and package support will take much longer to materialize.

So I'd keep using R, and be mindful of the developing alternatives. Last year a lot of people went gaga over Clojure; this year Julia is the reigning new flavour. We'll see if it sticks.

Dirk Eddelbuettel
  • 8,362
  • 2
  • 28
  • 43
  • Thank you for the insight. I figured it might be a little early to tell on Julia. You're probably a bit biased, but do you see RCpp as a valid alternative in the interim, or is there too much programming knowledge needed to make it worth the hassle? – Christopher Aden Apr 02 '12 at 00:19
  • 16
    Because of what I have seen via Rcpp, I am even more impressed by Julia---about 50, 60, 70 fold increases for simple looping as in MCMC, and several hundred fold for "degenerate" examples like fibonacci are essentially the same as Rcpp got! But I also know that with Rcpp I still get access to the 3700 CRAN packages---as well as countless C++ libraries---whereas Julia right now has almost nothing. That said, the promise of Julia is huge. But maybe there is a "then" as well as a "now". Time will tell. – Dirk Eddelbuettel Apr 02 '12 at 00:25
  • 2
    And don't forget Incanter, which is supposed to become a statistical environment based on Clojure. How is Julia superior to that? – Wayne Apr 03 '12 at 15:35
  • 2
    @Wayne, let's not muddy the waters here. Open a new question for that (perhaps one that asks for comparison between multiple languages) – naught101 Apr 03 '12 at 22:01
  • 2
    @naught011: I'm simply echoing Dirk's point that Clojure was flavor of the month, then specifically Incanter, now Julia. I don't think that Julia or Incanter (or Clojure) stand a chance of being generalized statistical platforms. – Wayne Apr 04 '12 at 00:40
  • Julia can't be compared to things like Incanter or clojure. Julia is a really new paradigm, based on JIT compiling to machine code! Its a completely different game. – kjetil b halvorsen Jun 22 '14 at 14:24
  • @DirkEddelbuettel: Has it stuck? If anything you as a lead developer R dev should be able to do a more educated guess now than us after having almost 3 years of checking the *stickiness* of it! – usεr11852 Mar 21 '15 at 23:50
  • 2
    I have no idea, but I gladly update the R side: As of today over 6400 packages on CRAN, and now over 350 of those using Rcpp. Still works for me. Julia folks seem active, and happy---and having a choice is a good thing. There is no one language for all problems: [sorry, Python](https://thescienceweb.wordpress.com/2015/03/19/all-other-languages-tired-of-pythons-shit/). – Dirk Eddelbuettel Mar 21 '15 at 23:53
19

Bruce Tate here, author of Seven Languages in Seven Weeks. Here are a few thoughts. I am working on Julia for the followup book. The following is just my opinion after a few weeks of play.

There are two fundamental forces at play. First, all languages have a lifespan. R will be replaced some day. We don't know when. New languages have an extremely difficult time evolving. When a new language does evolve, it usually solves some overwhelming pain point.

These two things are related. To me, we're starting to see a theme taking shape around languages like R. It's not fast enough, and it's harder than it needs to be. Those who can live within a certain performance envelope and stay within established libraries are fine. Those who can't need more, and they're starting to look for more.

The thing is, computer architectures are changing, and to take advantage of them, the language and its constructs need to be constructed in a certain way. Julia's take on concurrency is interesting. It optimizes the right thing for such a language: transparent distribution and the efficient movement of data between processes. When I use Julia for typical tasks, maps and transforms and the like, I am just calling functions. I don't have to worry about the plumbing.

To me, the fact that Julia is faster on one processor is interesting, but not overly damning for R. The thing that is interesting to me is that as processors depend more and more on multicore for performance, technical computing problems are just about ideally positioned to take the best possible advantage, given the right language.

The other feature that will help that happen is indeed macros. The pace of the language is just intense right now. Macros let you build with bigger, cleaner building blocks. Looking at libraries is interesting but doesn't tell the whole picture. You need to look at the growth of libraries. Julia's trajectory is pretty much spot on here.

Clojure is interesting to some because there's no technical language that does what R can, so some look to a general purpose language to fill that void. I am actually a huge fan. But Clojure is a pretty serious brain warp. Clojure will be there for programmers who need to do technical computing. It won't be for engineers and scientists. There's just too much to learn.

So to me, Julia or something like it will absolutely replace R some day. It's a matter of time.

Nick Stauner
  • 11,558
  • 5
  • 47
  • 105
user1295658
  • 1
  • 1
  • 3
  • There aren't many new languages that provide both templated types and a first class lisp-derived macro ecosystem - Julia does. This capability along with it's concurrency features and speed (that will likely improve in future versions) give it a strong competitive position against other languages, in my view. I rarely use R but frequently use C++ (w/templates) and Lisp (w/macros). Julia can do both, cleanly and efficiently in a single, clear language. I am convinced that Julia will prove to be a major language in the future. – AsymLabs Jan 10 '18 at 21:51
15

Every time I see a new language, I ask myself why an existing language can't be improved instead.

Python's big advantages are

  • a rich set of modules (not just statistics, but plotting libraries, output to pdf, etc.)
  • language constructs that you end up needing in the long run (objected-oriented constructs you need in a big project; decorators, closures, etc. that simplify development)
  • many tutorials and a large support community
  • access to mapreduce, if you have a lot of data to process and don't mind paying a few pennies to run it on a cluster.

In order to overtake R, Julia, etc., Python could use

  • development of just-in-time compilation for restricted Python to give you more speed on a single machine (but mapreduce is still better if you can stand the latency)
  • a richer statistical library
Neil G
  • 13,633
  • 3
  • 41
  • 84
  • 3
    This may be true, but for a very-casual user, Python's language design may be a little harder to use than something like Matlab, or Julia, which has an even more math-like syntax. You can say `y = 3x+2` in Julia and it works! – Harlan Apr 03 '12 at 16:20
  • @Harlan: How many statisticians are "very-casual users"? – Neil G Apr 03 '12 at 16:28
  • 6
    That's funny: when I first saw Python some 10+ years ago I had exactly the same reaction (why is this needed? Why not just improve what's out there already? Why learn a whole new set of bizarre syntactic quirks, names of classes, methods, and procedures, and all the rest?). :-) – whuber Apr 03 '12 at 16:43
  • 2
    @NeilG Not professional statisticians so much as non-programmer researchers in especially the sciences. Python's great for programmers, but if all you want to do is load your psychology data and fit some models (quickly), a very simple math-like syntax might be preferable to Python's elegant object-based design. – Harlan Apr 03 '12 at 17:05
  • @ Harlan - I doubt these users would need Julia, because R fits hat bill perfectly. – Owe Jessen Apr 03 '12 at 19:53
  • 3
    @NeilG Keep in mind part of the success of R is that it's not just used by statisticians. It's used by people who *do statistics*. And social scientists, clinicians and first-year science graduate students are absolutely very casual users. – Fomite Apr 04 '12 at 17:07
  • @OweJessen Per my answer, they actually tend to follow where the statisticians go, because that means a preexisting code base. If the stats people all move to Julia, unless they start doing joint development with R, the users will move to where the people who make the code they need are. – Fomite Apr 04 '12 at 17:08
  • @EpiGrad: You are certainly right there, but the existing code is allreade in R, so Python and Julia have to either replicate the gigantic amount of work done for R (reinvent the wheel), or produce large benefits in places where R is lacking, which to me sounds like niche applications with regard to the casual useR (really casual users will continue to suffer Excel). Are there any ideas how Julia ties in with Ross Ihaka's sentiment to just start over (http://xianblog.wordpress.com/2010/09/13/simply-start-over-and-build-something-better/)? – Owe Jessen Apr 04 '12 at 18:19
  • @OweJessen: Well, Python and R are open source, so the code can be ported without reinventing anything. There are significant advantages to Python as soon as you are doing anything complicated, which is partially driving the development of numpy, pandas, scipy, etc. – Neil G Apr 04 '12 at 18:29
  • 6
    I think (CrossValidated member) John D Cook's blog post is spot on: I'd much rather program math in a general purpose language than try to code math and systems problems in a math language. If the Julia community can keep this in mind, there is a good chance the language will stick for analytic programming in general (stats being only one part of that). See http://www.johndcook.com/blog/2012/04/02/why-scipy/ – Josh Hemann Apr 06 '12 at 04:35
9

Julia will not take over R very soon. Check out Microsoft R open.

https://mran.revolutionanalytics.com/open/

This is an enhanced version of R that automatically uses all the cores of your computer. It is the same R, same language, same packages. When you install it, RStudio will also use it in the console. The speed of MRO is even faster than Julia. I do a lot of heavy-duty computing and have used Julia more than a year. I switched to R recently because R has a better support and RStudio is an awesome editor. Julia is still in early stage and possibly not catching up Python or R very soon.

Milton Mai
  • 1
  • 1
  • 1
8

I am a Julia newbie, and am R competent. The reasons I find Julia interesting so far are performance and compatibility oriented.

GPU tools. I'd like to use CUSPARSE for a statistical application. CRAN results indicate there's not much out there. Julia has bindings available which seem to work smoothly so far.

using CUSPARSE
N = 1000
M = 1000
hA = sprand(N, M, .01)
hA = hA' * hA
dA = CudaSparseMatrixCSR(hA)
dC = CUSPARSE.csric02(dA, 'O') #incomplete Cholesky decomp
hC = CUSPARSE.to_host(dC)

HPC tools. One can use a cluster interactively with multiple compute nodes.

nnodes = 2
ncores = 12    #ask for all cores on the nodes we control
procs = addprocs(SlurmManager(nnodes*ncores), partition="tesla", nodes=nnodes)
for worker in procs
    println(remotecall_fetch(readall, worker, `hostname`))
end

Python compatibility. There's access to the python ecosystem. E.g. It was straightforward to find out how to read brain imaging data:

import PyCall
@pyimport nibabel

fp = "foo_BOLD.nii.gz"
res = nibabel.load(fp)
data = res[:get_data]();

C compatibility. The following generates a random integer using the C standard library.

ccall( (:rand, "libc"), Int32, ())

Speed. Thought I would see how the Distributions.jl package perfomed against R's rnorm - which I assume is optimised.

julia> F = Normal(3,1)
Distributions.Normal(μ=3.0, σ=1.0)

julia> @elapsed rand(F, 1000000)
0.03422067

In R:

> system.time(rnorm(1000000, mean=3, sd=1))
   user  system elapsed 
  0.262   0.003   0.266 
conjectures
  • 3,971
  • 19
  • 36
  • 1
    @NickCox, as there are more than a dozen answers already, I thought it may be interesting to highlight an alternate angle. Also, I posted an early draft accidentally :) – conjectures Oct 12 '15 at 17:05
  • 1
    The question was why Julia might stick in the statistical community, my answer centres on apparently good support for hpc + gpu, which many people with compute intensive work may find interesting. – conjectures Oct 12 '15 at 17:28
8

The following probably does not deserve to be an answer, but it is too important to be buried as a comment to someone else's response...

I have not heard much said about memory consumption, just speed. R's entire semantics being pass-by-value can be painful, and this has been one criticism of the language (which is a separate issue from how many great packages already exist). Good memory management is important, as is having ways of dealing with out-of-core processing (e.g. numpy's memory mapped arrays or pytables, or Revolution Analytics' xdf format). While PyPy's JIT compiler allows for some striking Python benchmarks, memory consumption can be quite high. So, does anyone have experience with Julia and memory usage yet? Sounds like there are memory leaks on the Windows "alpha" version that will no doubt be addressed, and I am still waiting on access to a Linux box to play with the language myself.

Josh Hemann
  • 3,384
  • 1
  • 21
  • 18
8

I think it's unlikely that Julia will ever replace R, for a lot of the reasons previously mentioned. Julia is a Matlab replacement, not a R replacement; they have different goals. Even after Julia has a fully-fleshed out statistics library, no one would ever teach an Intro to Statistics class in it.

However, an area in which it could be incredible is as a speed-optimized programming language that's less painful than C/C++. If it were seamlessly linked to R (in the style of Rcpp), then it would see a ton of use in writing speed-critical segments of code. Unfortunately no such link exists currently:

https://stackoverflow.com/questions/9965747/linking-r-and-julia

Ari B. Friedman
  • 3,421
  • 4
  • 33
  • 42
8

Julia 1.0 has just come out with a very usable IDE (Juno). It came out a bit late to the party as Python has already dominated Machine Learning, while R continues to dominate every other kind of statistical analysis. That being said, Julia is already rising to prominence in the area of finance and trading algorithms as fast development time AND execution are a must. In my opinion, unless another language comes along that is distinctly better, Julia's rise to prominence will probably look something like this:

(1) It starts to eat MATLAB's lunch. MATLAB users like the MATLAB syntax but hate pretty much everything else. The slowness, the expensive licenses, the very limited ways to deal with complex data structures that are not matrices. I remember one quote where it is said that "If Julia replaces MATLAB, it will be a huge service to humanity". MATLAB users can become proficient in Julia very quickly and will be impressed by the ease it is to write quality code that does so much more than what MATLAB can do (Structs that are fast that you can put in arrays and quickly iterate over?). Not only this, researchers can make serious toolboxes in Julia (a small team Ph.D. students wrote a world-class differential equations package) that would have been impossible with MATLAB.

(2) It starts taking over research in numerical methods and simulation. MIT is throwing its weight behind Julia, and the research community listen's to MIT. Numerical simulations and new numerical methods are ill-defined problems that have no libraries. This is where Julia as a language shines; if there is no libraries available, it is much easier to write fast quality code in Julia than any other language. It will be a numerical/simulation language that is written by mathematicians for mathematicians (sound similar to R yet?)

(3) Another breakthrough in Machine Learning happens that gives Julia the edge. This is a bit of a wildcard which might not happen. TensorFlow is great, but it is extremely hard to hack. Python has already started showing cracks and TensorFlow has started adopting Swift (with Julia getting an honorable mention). If another machine learning breakthrough happens, it will be much easier to implement and hack in a Julia package like Flux.jl.

(4) Julia starts slowly catching up to R, which will take a while. Doing stats in MATLAB is painful, but Juila is already way ahead of MATLAB with Distributions.jl. The fact is, R workflows can be easily translated to Julia. The only real advantage R has is the fact that there are so many packages are written by statisticians for statisticians. This process however, is also easy to do in Julia. The difference is that Julia is fast all the way down and you don't have to use another language for performance (the more "serious" R packages are written in languages like C). The problem with R is that packages written in R are too slow to handle large sets of data. The only alternative is to translate the packages into another language making development in R a slower process than Julia. If too many R packages need translating to handle larger datasets, R may start playing catch-up with Julia in these areas.

Deduction
  • 51
  • 1
  • 1
  • 2
    The quote about replacing Matlab that you remember is [from this thread](https://stats.stackexchange.com/a/25781/9964). :) – Danica Jan 17 '19 at 16:21
5

I am interested by the promise of better speed and easy parallelisation using different architectures. For that reason I will certainly watch Julia development but I am unlikely to use it until it can handle generalised linear mixed models, the has a good generic bootstrap package, a simple model language for building design matrices the capability equivalent to ggplot2 and a wide range from machine learning algorithms.

No statistician can afford to have a fundamentalist attitude to the choice of tools. We will use whatever enables us to get the job done most efficiently. My guess is I will be sticking with R for a few years yet, but but it would be nice to be pleasantly surprised.

  • Hi Mervyn, and welcome to Stats.SE! Julia has made some substantial improvements in the time since I created this post (almost a year ago!). Douglas Bates ported some of his GLM (maybe GLMM?) code to Julia http://dmbates.blogspot.com/2012/04/r-programmer-looks-at-julia.html), and the main Github page has seen many updates in the past year. My take on Julia thus far (I've used it on and off since last year) has been that's a nice tool for speed, which I use for some crude MCMC, but it hasn't replaced R in my toolchain yet. Can't wait for either R to get faster, or Julia to be more widespread! – Christopher Aden Apr 15 '13 at 22:05
  • Doug hasn't ported GLMMs yet. If someone wants to help with that I'm sure he would be happy ... – Ben Bolker Jan 25 '14 at 17:24
4

The luxury of NA's in R does not come without performance penalties. If Julia supports NA's with a smaller performance penalty then it becomes interesting to a segment of the stats community, but NA's also impose considerable extra work when using compiled code with R.

Many of the packages in R rely on routines written in legacy languages (C, Fortran, or C++). In some cases the compiled routines were developed outside R and later used as the basis for R library packages. In others the routines were first implemented in R and then critical segments translated to a compiled language when performance was found lacking. Julia will be attractive if it can be used to implement equivalent routines There is an opportunity to design low-level support for NA's in a way that simplifies NA handling over what we have now when using R with compiled code.

The massive number of R libraries represents the efforts of many many users. This was possible because R provided capabilities that weren't otherwise available/affordable. If Julia is to become widely used, it needs a group of users who find it does what they need so much better than the alternatives that is worth the effort needed to supply very basic things (e.g., graphics, date classes, NA's, etc.) available from existing languages.

4

I will be up front, I have no experience with R, but I work with plenty of people that think it is an excellent tool for statistical analysis. My background is in data warehousing, and due to Julia's easily distributed, but more standard programming model, I think it could be a very interesting substitute for the transform portion of traditional ETL tools that generally do the job very poorly , most have no way of easily creating a standardized transform, or re-using the results of a transform already performed on a prior data-set. The support for tightly defined and typed tuples stands out, if I want to build an OLAP cube that basically needs to build more detailed tuples (fact tables) out of tuples already calculated, today's ETL tools have no 'building blocks' to speak of that can help, this industry has worked around this issue through various means in the past, but there are trade-offs. Traditional programming languages can help by providing centrally defined transformations, and Julia could potentially simplify the non-standard aggregations and distributions common in more complex data warehouse systems.

Preston
  • 1
  • 1
3

You can also use Julia and R together. There is Julia-to-R interface. With this packages you can play with Julia while calling R whenever it has a library that would be needed.

vasili111
  • 755
  • 2
  • 10
  • 21
2

Julia has without doubt every chance of becoming a statistics power-users dream come true, take SAS for example, it's power lies in the numerous procs written in C - what Julia can do is give you the procs with the source code, with matrices as a built in data type dispensing with SAS/iml. I have no doubt that statisticians will flock to Julia once they get a handle on just what this puppy can do.

Jimbo He
  • 1
  • 1
  • 1
    Welcome to Stats.SE, Jimbo. I disagree with your assertion. I think we've seen what Julia is able to do, but the problem at this point is that there aren't nearly as many domain-specific packages for it as there are in R. R will continue to reign supreme in open source statistics as long as researchers see more benefit to using the numerous packages in the R universe. That's my take, at least. – Christopher Aden May 06 '13 at 16:07
1

Oh yes, Julia will overtake R quite quickly. And the primary reasons will be "macros", 95% of the language is implemented in Julia, and its noise free, parsimonious syntax. If you don't have experience with lisp type of languages you might not understand it as yet, but you will see pretty quickly how R formula interface will became an obsolete and ugly mechanism, and will be replaced by specialized modeling micro languages akin to CL loop macro. Access to low level references of an object is also a big plus. I think R still didn't get that hiding internals from the user actually complicates than simplifies the things.

As I see it now (having years of heavy use of R behind, and just finished reading Julia manual), Julia's main drawbacks with respect to R is no support for structural inheritance (this was intentional). Julia's type system is less ambitious than S4; it also supports multiple dispatch and multiple inheritance, but with a catch - there is only one level of concrete classes. On the other hand I rarely see class hierarchies in R deeper than 3 levels.

Time will tell, but it will be sooner than most R users think:)

Amelio Vazquez-Reina
  • 17,546
  • 26
  • 74
  • 110
VitoshKa
  • 302
  • 2
  • 8
  • 2
    You make a good point about macros: decades later people still underestimate how powerful Lisp really is. However, as you imply in point #1, this language is essentially a Matlab replacement, not an R replacement. I think you also ignore the fact that it is language plus libraries (packages) that people use and Julia doesn't even have 1% of what it needs there. – Wayne Apr 06 '12 at 14:32
  • 2
    @Wayne, I don't ignore anything, the OP was about the future and not about what is now. In 5 years, we might see many more libraries for stats in Julia than there are now for R. And this, just because Julia has a good chance to be a much better language. – VitoshKa Apr 12 '12 at 18:44
  • If julia really becomes a MATLAB replacement, then it will have huge benefits to use the same language for engineering and statistics! The overlapping areas (such as time series) are huge. – kjetil b halvorsen Jun 22 '14 at 14:44
  • 1
    2012: _In 5 years, we might see many more libraries for stats in Julia than there are now for R._ It didn't happen. Language design is never a criterium for uptake or replacement. Otherwise C/C++ would be bombed and dead by now, Java put out of its clunky misery and JavaScript a footnote in the "Annals of the History of Computing" for the article on "Completely bonkers attempts at languages". Alas, no. – David Tonhofer Oct 27 '19 at 15:20
1

Julia's first target use cases are numerical problems. Basically, you can break these analysis and computational science fields into data science (data driven) and simulation science (model driven). Julia is dealing with the simulation science use cases first. They are also dealing with the data science cases, but more slowly. R will never be very useful for simulation science, but Julia will be very useful for both in a couple of years.

0

It needs to be able to apply any function to large datasets that don't fit on memory transparently for the user.
That includes at least running mixed effects models, survival models or MCMC on datasets that fit on the disk but not on memory. And if possible on datasets distributed on several computers.

skan
  • 814
  • 1
  • 7
  • 20