Decision tree that fits a regression at leaf nodes

Question

Is there any academic work on any Decision Tree that fits a regression at its final leaf node?

For instance, suppose I have 100 features (X), and use them build a tree with 3 depths such that I have 8 leaf nodes.

Standard tree approach gives the mean value of Y for the observations in the 8 leaf nodes.

Is there any decision tree algorithm in academic literature that instead does a regression of the Y’s on X for observations in each of the 8 leaf nodes ?

Yes, as the [Wikipedia article](https://en.wikipedia.org/wiki/Decision_tree_learning) mentions, regression trees are a thing. — deemel, Dec 07 '19 at 16:04
There are various algorithms that include regression models in the terminal nodes such as M5, FT, GUIDE, and MOB. See https://stats.stackexchange.com/questions/141251/advantage-of-glms-in-terminal-nodes-of-a-regression-tree/141271#141271 and https://stats.stackexchange.com/questions/78563/regression-tree-algorithm-with-linear-regression-models-in-each-leaf/78568#78568. In R there is the convenient interface `lmtree()` in the partykit package (based on MOB), and `M5P()` in RWeka (approximating M5). — Achim Zeileis, Dec 07 '19 at 16:30
Thanks Achim. That’s exactly what I wanted. Much appreciated — uday, Dec 07 '19 at 16:38
Agree. @AchimZeileis if you make it into an answer I will speech it. Thanks — uday, Dec 07 '19 at 22:48
I felt that the comment wasn't so much more useful than the replies by @Momo to warrant a new answer. But now I took the time to put together some references and some more details. So it's probably ok as a new answer... — Achim Zeileis, Dec 08 '19 at 08:07

score 5 · Accepted Answer · answered Dec 08 '19 at 08:05

Although regression trees with constant fits in the terminal nodes are still much more widely used in practice, there is a long history of literature on regression trees that fit regression models (or other kinds of statistical models) in the nodes of the tree. RECPAM by Ciampi et al. (1988) is pioneering work in the statistical literature, already focusing on survival models in the nodes. M5 by Quinlan (1992) is the first algorithm published in Machine Learning. Other algorithms include GUIDE (Loh 2002), FT (Gama 2004), RD and RA trees (Potts & Sammut 2005), and MOB (Zeileis et al. 2008). An overview of several of these, focusing on MOB in the illustrations, is given by Rusch & Zeileis (2013). The algorithms differ somewhat in the degree to which you have to explicitly specify the regression model considered for the nodes (e.g., as in MOB) vs. using all variable in both the regression and the partitioning (e.g., in M5).

R packages

lmtree() in the partykit package (based on MOB)
M5P() in the RWeka package (approximating M5)

Previous answers (by @Momo)

References

Ciampi A, Hogg SA, McKinney S, Thiffault J (1988). "RECPAM: A Computer Program for Recursive Partition and Amalgamation for Censored Survival Data and Other Situations Frequently Occurring in Biostatistics. I. Methods and Program Features." Computer Methods and Programs in Biomedicine, 26(3), 239-256.
Quinlan R (1992). "Learning with Continuous Classes." In Proceedings of the Australian Joint Conference on Artificial Intelligence, pp. 343–348. World Scientific, Singapore.
Loh WY (2002). "Regression Trees with Unbiased Variable Selection and Interaction Detection." Statistica Sinica, 12, 361–386. http://www.jstor.org/stable/24306967
Gama J (2004). "Functional Trees." Machine Learning, 55(3), 219–250. doi:10.1023/B:MACH.0000027782.67192.13
Potts D, Sammut C (2005). "Incremental Learning of Linear Model Trees." Machine Learning, 61(1-3), 5–48. doi:10.1007/s10994-005-1121-8
Zeileis A, Hothorn T, Hornik K (2008). “Model-Based Recursive Partitioning.” Journal of Computational and Graphical Statistics, 17(2), 492–514. doi:10.1198/106186008x319331
Rusch T, Zeileis A (2013). "Gaining Insight with Recursive Partitioning of Generalized Linear Models." Journal of Statistical Computation and Simulation, 83(7), 1301–1315. doi:10.1080/00949655.2012.658804

Decision tree that fits a regression at leaf nodes

1 Answers1