Random jungle - why was the algorithm/tool abandoned in general but not by Microsoft?

Question

This references to:

Optimized implementations of the Random Forest algorithm

in particular where it says "Random Jungle -- abandoned?"

Background:

I found there is a directed graph version of a random forest - it is otherwise the same as a random forest but the base/weak learner is an acyclic directed graph. (1)

It is relevant enough that Microsoft has implemented it into their "Azure machine learning cheat-sheet" as a tool for "fast, accurate, small memory footprint" binary classification - they claim it compares favorably against CART ensembles for memory overhead with equivalent accuracy. (2)

I also found that, about 3 years ago (as of 2016) a package for R was made for it called "Rjungle". (3)

When I go to the help docs for the package, there are no examples. There seems to be no vignette. When I look for examples of use in the help documentation, there seem to be none. ("?rjungle", "vignette(Rjungle)")

There is an online manual for another Rjungle software. It has some "how to interface R to it" but not the same as an actual library. I suspect this is the "iron" behind "Rjungle". (4)

Questions (about algorithm/method in general):

Why is this "random jungle" a tool that Microsoft thinks should be an equivalent to a random forest having essentially no development in the last 3 years? Is this a case of a tool with a fatal flaw that Microsoft either missed or is using on the hype to false-advertising spectrum, or is this a case of an excellent tool that the general machine learning community missed?
Does it perform to its claims of equal accuracy with less memory? How does it perform against a functional benchmark like (but not limited to) Breiman's "randomForest" on something like (but not limited to)this: (blog, github)? Are there published comparisons against CART based, as compared to DAG based, ensembles?

Additional stuff:

Microsoft classifies the primary use of random jungles as "computer vision" in "related info" here (link).
There are two published papers that come up in a microsoft search, two from 2013, and one from 2016. The authors in common include "Jamie Shotton" and "Antonio Criminisi". Jamie is the first author on the 2013 papers, but the 5th author on the 2016 paper. Antonio is the last author on both papers.
The 2016 paper (link) is about fusing CNN's with CART/DAG ensembles. They claim to have compute cost less than half, and one fifth the parameter count, compared to reference CNN based model (VGG11). They also claim a compute speed that is 5x faster and parameter count that is 6x smaller than NiN for the same accuracy.

Interesting, haven't heard of this algorithm. My guess is that low memory footprint of the model is just generally not too much of a concern in the majority of cases. Random forest is mature, the code for it is mature, and it's an easy enough model to understand. If people already are comfortable with the forest, it takes a better sales pitch than "the same, but uses less memory" to convince them. — Matthew Drury, Dec 16 '16 at 16:27
@MatthewDrury - in its guts the RF is an ensemble of CART models. So is xgboost, the "winningest algorithm on kaggle", except that unlike RF, it is in series not parallel. The series CART ensemble can be hugely memory expensive. If the DAG does what a CART does but much cheaper in memory, then it can make a really strong performance and utility improvement to the boosted serial ensemble (boosted weak learners). — EngrStudent, Dec 16 '16 at 16:31
Oh yah, I get that. I was just making a sociological statement about humans adoption of new tools. I haven't seen many instances of "I would use random forest, but the model consumes too much RAM". — Matthew Drury, Dec 16 '16 at 18:00

score 3 · Answer 1 · answered Dec 22 '16 at 00:21

I tried to implement this at one point in an existing rf framework and found it difficult go get good performance in terms of training time.

The standard cart algorithm used in random forests and xgboost works by recursive partitioning of the data. In this case data (or an array of index into it) can be reordered in memory so each of the partitions is continuous and further splits can be calculated by scanning left to right across the current partition as ordered by the feature under consideration iteratively updating the impurity estimates. This can all be done with zero memory allocations making it quite fast.

Jungles allow nodes to be recombined which doesn't work well with this reorder, scan and split algorithm. MS has probably figured out a clever way to implement it but my naive implementation ended up being much slower to train then a standard rf.

The most recent papers are interesting and if more results like that are found perhaps it will catch on.

score 1 · Answer 2 · answered Mar 14 '19 at 15:37

Microsoft's "Decision Jungle" share the name with the "Random Jungle" tool. The "Random Jungle" software is an implementation of "Random Forest" and can run on many servers in a parallel manner. The rjungle refers to this "Random Jungle" tool.

Random jungle - why was the algorithm/tool abandoned in general but not by Microsoft?

2 Answers2