Advantage of GLMs in terminal nodes of a regression tree?

Question

So I'm playing around with the idea of writing an algorithm that grows and prunes a regression tree from the data and then, in the terminal nodes of the tree, fits a GLM. I've been trying to read up on the idea but I cant seem to find any consistent name for the technique. I've read it as hybrid regression trees (HRT), model trees, and functional trees. Searches on these terms turn up very little.

Am I missing another name for this? Where can I find research on the effectiveness of this?

Momo · Accepted Answer · 2016-11-29T19:18:47.107

Like you say this idea has been explored before (albeit under different names) and there actually is a broad literature on that topic. The names that I associate with this line of work are Wei-Yin Loh, Probal Chaudhuri, Hongshik Ahn, Joao Gama, Antonio Ciampi or Achim Zeileis. You can find a rather comprehensive description of pros and cons and different algorithms (slightly outdated) in this thesis.

Trees with GLM have the following (dis-) advantages (paraphrased from here - you can easily find the preprint by googling):

The functional form of a GLM can can sometimes appear to be too rigid for the whole data set, even if the model might fit well in a subsample.
Especially with large data sets or data sets where knowledge about the underlying processes is limited, setting up useful parametric models can be difficult and their performance with respect to prediction may not be suffcient.
Trees are able to incorporate non-linear relationships or find the functional relationship by themselves and therefore can have higher predictive power in settings where classic models are biased or even fail.
Due to their explorative character, trees with GLM can reveal patterns hidden within data modelled with GLM or provide further explanation of surprising or counter-intuitive results by incorporating additional information from other covariates.
They can be helpful in identifying segments of the data for whom an a priori assumed model fits well. It may be that overall this model has a poor fit but that this is due to some contamination (for example merging two separate data files or systematic errors during data collection at a certain date). Trees with GLM might partition the data in a way that enables us to find the segments that have poor fit and find segments for which the fit may be rather good.
The tree-like structure allows the effects of these covariates to be non-linear and highly interactive as opposed to assuming a linear in influence on the linked mean.
Trees with GLM may lead to additional insight for an a priori assumed parametric model, especially if the underlying mechanisms are too complex to be captured by the GLM.
Trees with GLM can automatically detect interactions, non-linearity, model misspecification, unregarded covariate influence and so on.
They can be used as an exploratory tool in complex and large data sets for which it has a number of advantages.
Compared to a global GLM, a GLM model tree can alleviate the problem of bias and model misspecification and provide a better fit.
Compared to tree algorithms with constants, the specification of a parametric model in the terminal nodes can add extra stability and therefore reduce the variance of the tree methods.
Being a hybrid of trees and classic GLM-type models, the performance usually lies between those two poles: They tend to exhibit higher predictive power than classic models but less than non-parametric trees.
They add some complexity compared to classical model because of the splitting process but are usually more parsimonous than non-parametric trees.
They show a higher prediction variance than a global model in bootstrap experiments, but much less than non-parametric trees (even pruned ones).
Using a GLM in the node of a tree typically leads to smaller trees
Using a GLM in the node of a tree typically leads to more stable predictions as compared to a tree with only a constant (but not as stable as bagging or forests of trees)
The VC Dimension of a tree with GLM in the nodes is higher than the equivalent tree with only a constant (as the latter is a special case of the former)

Regarding the "effectiveness" (I assume you mean predictive performance) of trees with GLM, most of the papers cited in the above two links do provide some investigation into that. However, a comprehensive, broad comparison of all the algorithms with competitors like standard trees have not been done to the best of my knowledge.

Does anybody know more papers, applying regression trees with GLMs? I am particularly curious how researchers interpret results. For example, if you use such model for cross-sectional prediction across people, then do you argue that leaves in regression tree constitute meaningful groups of people? — Moysey Abramowitz, Dec 24 '20 at 20:53

Advantage of GLMs in terminal nodes of a regression tree?

1 Answers1

Linked