In a regression tree (I am particularly interested in random forest regression, but it seems like this can be generalised to regression trees as a whole) a number of random variables is selected at a root node of the tree and the best variable split is selected in order to split the node into two daughter nodes. This continues until a set of criteria is fulfilled. My questions here are: How is the best variable split determined at each tree point in a regression tree? What are the criteria to stop? As far as I understand the default for random forest regression is that the best variable split should produce lowest mean squared error (a mean squared error of what exactly?) and the stopping criteria are either mean squared error is lower than the entire dataset (somehow I don’t think that is entirely correct) or the resulting leaf size would contain less than 5 variables.
Once it is completed, the algorithm outputs an ensemble of trees. The prediction can then be made. My other question here is; How is the final prediction made at each leaf in a regression tree? Of course the prediction is an average of the ensemble of the trees, but an average of what exactly? I think in the simplest case (called a constant estimate) it simply is the mean value of variables in that leaf averaged over all trees(?). Or is it a linear regression computed within that leaf averaged over an ensemble?