ML regression methods that generalize linear regressions

Question

It seems to me that despite the huge variety and development of ML methods (I'm specifically interested in regression methods), OLS is still considered, and often cited, as a benchmark - which makes sense at least because of its easy interpretation.

Despite this, I failed to find regression methods that can be thought to generalize a multivariate OLS, in the sense of being able to identify linear relations between features. I find this surprising because in many real world examples (I work with economic data) linear relations coexist with nonlinearities, and because... in principle, it does not seem like a difficult task.

For instance, the value predicted by a regression tree for an observation only depends on the leaf it ends into. Why not fit a linear regression on each leaf instead, and use the resulting (leaf-specific) coefficients for the prediction? Yes, the cost would (if I understand correctly) dominate the cost of building the tree... but on the other hand, it would still be asymptotically equivalent to the cost of an OLS (with a lower memory footprint). And as long as the tree is not too deep (that is: each leaf still contains enough observations), this would for instance perfectly predict a pure linear relation. It would be a sort of multi-dimensional automatic piecewise linear regression.

Or vice-versa, it would be tempting to run a regression tree on OLS residuals. If done by using wisely the (example from sklearn) min_impurity_decrease argument, this would, it seems to me, reliably dominate OLS in terms of explanatory power, at least for large samples (with the two coinciding in the case of a perfectly linear relation).

Is there a reason why techniques of this kind are not widespread? Or are they?

See this answer for many useful links: https://stats.stackexchange.com/questions/247131/non-linear-regression-at-leafs-decision-tree/248633#248633 The reason why these tree methods are not so wide-spread is presumably that they require more insight whereas plain CART is simpler to explain and understand. For some examples from economics and the social sciences, you might enjoy looking at this presentation: https://eeecon.uibk.ac.at/~zeileis/papers/DAGStat-2010.pdf — Achim Zeileis, Mar 16 '18 at 13:24

score 3 · Accepted Answer · answered Mar 14 '18 at 02:22

3

You may want to consider reading this paper https://arxiv.org/pdf/1510.04342.pdf.

The authors attempt to fit trees to grow the forest and compute an estimate for each leaf to determine the treatment effect on a subpopulation of observations within each leaf. Each leaf is a subpopulation of observations as identified by the trees on some defined split measure typically chosen based on the domain knowledge.

answered Mar 14 '18 at 02:22

kms

530
2
17

+1 for the link to the work of Wager and Athey. There has been a number of earlier efforts for trees, though, combining (generalized) linear models with recursive partitioning. For some of these convenient software is also available, e.g., M5P, GUIDE, or MOB. See my comment above. – Achim Zeileis Mar 16 '18 at 13:19

ML regression methods that generalize linear regressions

1 Answers1