Combining Random forest with Adam (or an other gradient method)

Question

There is no "gradient" in the standard Random Forest formulation, but can I combine random Forests with an optimisation method like Gradient Descent or SGD?

Can I use Adam (Adaptive moment estimation) for a Random Forest?

Can you please describe in more detail what the scenario you are exploring? There is no "gradient" in the standard Random Forest formulation so it is unclear what is being asked. — usεr11852, May 07 '19 at 09:41
Erm.. Now the Q literally reads "*it is unclear what is being asked*" so I think it is even more confusing. But anyway I think I understand it enough to "attempt" an answer. Yes, we can. If we use GD we are simply describing Gradient Boosted Regression Trees (GBRT), i.e. Gradient Boosting. If we want SGD, we are getting closer to something like DART boosters (Drop-outs). Momentum methods have not been created yet mostly because we get great results with Newton's method (e.g. XGBoost, LightGBM). — usεr11852, May 07 '19 at 10:57
I answered based on the text of the question. In my opinion, the title does not make much sense, perhaps it could be improved. — Firebug, May 07 '19 at 12:29

Firebug · Accepted Answer · 2019-05-07T12:44:07.700

In a sense, yes, you can.

As I alluded in my question here What *is* an Artificial Neural Network?, a decision tree is a neural network, lato sensu. So you can train these neural networks in parallel and combine then, with end-to-end gradient based optimization.

Another way you could combine random forests is through gradient boosting. Usually, for gradient boosting, "weak" learners are combined into a "strong" learner. In random forests and bagging techniques in general, "strong", biased, learners, are combined into a, hopefully, more parsimonious model. But nothing precludes you from combining random forests in gradient boosting fashion, which answers "can I combine random Forests with an optimisation method like Gradient Descent or SGD?".

Combining Random forest with Adam (or an other gradient method)

1 Answers1