This references to:
in particular where it says "Random Jungle -- abandoned?"
Background:
I found there is a directed graph version of a random forest - it is otherwise the same as a random forest but the base/weak learner is an acyclic directed graph. (1)
It is relevant enough that Microsoft has implemented it into their "Azure machine learning cheat-sheet" as a tool for "fast, accurate, small memory footprint" binary classification - they claim it compares favorably against CART ensembles for memory overhead with equivalent accuracy. (2)
I also found that, about 3 years ago (as of 2016) a package for R was made for it called "Rjungle". (3)
When I go to the help docs for the package, there are no examples. There seems to be no vignette. When I look for examples of use in the help documentation, there seem to be none. ("?rjungle", "vignette(Rjungle)")
There is an online manual for another Rjungle software. It has some "how to interface R to it" but not the same as an actual library. I suspect this is the "iron" behind "Rjungle". (4)
Questions (about algorithm/method in general):
- Why is this "random jungle" a tool that Microsoft thinks should be an equivalent to a random forest having essentially no development in the last 3 years? Is this a case of a tool with a fatal flaw that Microsoft either missed or is using on the hype to false-advertising spectrum, or is this a case of an excellent tool that the general machine learning community missed?
- Does it perform to its claims of equal accuracy with less memory? How does it perform against a functional benchmark like (but not limited to) Breiman's "randomForest" on something like (but not limited to)this: (blog, github)? Are there published comparisons against CART based, as compared to DAG based, ensembles?
Additional stuff:
Microsoft classifies the primary use of random jungles as "computer vision" in "related info" here (link).
There are two published papers that come up in a microsoft search, two from 2013, and one from 2016. The authors in common include "Jamie Shotton" and "Antonio Criminisi". Jamie is the first author on the 2013 papers, but the 5th author on the 2016 paper. Antonio is the last author on both papers.
The 2016 paper (link) is about fusing CNN's with CART/DAG ensembles. They claim to have compute cost less than half, and one fifth the parameter count, compared to reference CNN based model (VGG11). They also claim a compute speed that is 5x faster and parameter count that is 6x smaller than NiN for the same accuracy.