Like you say this idea has been explored before (albeit under different names) and there actually is a broad literature on that topic. The names that I associate with this line of work are Wei-Yin Loh, Probal Chaudhuri, Hongshik Ahn, Joao Gama, Antonio Ciampi or Achim Zeileis. You can find a rather comprehensive description of pros and cons and different algorithms (slightly outdated) in this thesis.
Trees with GLM have the following (dis-) advantages (paraphrased from here - you can easily find the preprint by googling):
The functional form of a GLM can can sometimes appear to be too rigid for the whole data set, even if the model might fit well in a subsample.
Especially with large data sets or data sets where knowledge about
the underlying processes is limited, setting up useful parametric
models can be difficult and their performance with respect to
prediction may not be suffcient.
Trees are able to incorporate non-linear relationships or find the
functional relationship by themselves and therefore can have higher
predictive power in settings where classic models are biased or even
fail.
Due to their explorative character, trees with GLM can reveal
patterns hidden within data modelled with GLM or provide further
explanation of surprising or counter-intuitive results by
incorporating additional information from other covariates.
They can be helpful in identifying segments of the data for whom an a
priori assumed model fits well. It may be that overall this model has
a poor fit but that this is due to some contamination (for example
merging two separate data files or systematic errors during data
collection at a certain date). Trees with GLM might partition the
data in a way that enables us to find the segments that have poor fit
and find segments for which the fit may be rather good.
The tree-like structure allows the effects of these covariates to be
non-linear and highly interactive as opposed to assuming a linear in
influence on the linked mean.
Trees with GLM may lead to additional insight for an a priori assumed
parametric model, especially if the underlying mechanisms are too
complex to be captured by the GLM.
Trees with GLM can automatically detect interactions, non-linearity,
model misspecification, unregarded covariate influence and so on.
They can be used as an exploratory tool in complex and large data
sets for which it has a number of advantages.
Compared to a global GLM, a GLM model tree can alleviate the problem
of bias and model misspecification and provide a better fit.
Compared to tree algorithms with constants, the specification of a
parametric model in the terminal nodes can add extra stability and
therefore reduce the variance of the tree methods.
Being a hybrid of trees and classic GLM-type models, the performance
usually lies between those two poles: They tend to exhibit higher
predictive power than classic models but less than non-parametric
trees.
They add some complexity compared to classical model because of the
splitting process but are usually more parsimonous than
non-parametric trees.
They show a higher prediction variance than a global model in bootstrap
experiments, but much less than non-parametric trees (even pruned
ones).
Using a GLM in the node of a tree typically leads to smaller trees
Using a GLM in the node of a tree typically leads to more stable
predictions as compared to a tree with only a constant (but not as
stable as bagging or forests of trees)
The VC Dimension of a tree with GLM in the nodes is higher than the
equivalent tree with only a constant (as the latter is a special case
of the former)
Regarding the "effectiveness" (I assume you mean predictive performance) of trees with GLM, most of the papers cited in the above two links do provide some investigation into that. However, a comprehensive, broad comparison of all the algorithms with competitors like standard trees have not been done to the best of my knowledge.