8

I have been playing around with post processing the results of the random forest for regression machine learning algorithm in order to try and do better than the default mean of all trees prediction. Chapter 16 of Elements of statistical learning talks about how and why this can be done using regularised regression (in that case elastic net).

On the other hand, I routinely re-calibrate the results of random forests using a logistic model, since I often see the kind of bias discussed in this question, and find that a simple linear logistic regression model tends to account for it. I have found that the calibration needs to be done on the ensemble prediction, attempting to calibrate individual trees doesn't work nearly as well.

I am unsure then of how to proceed when you want to do both of these things. If you do the elastic net regression first, you are finding the optimal weighted mean/subset of the forest ensemble for the biased prediction. I see no reason to be sure that this will be the optimal combination for a re-calibrated prediction. I don't see how the elastic net analysis could be run simultaneously with re-calibration, as the calibration is non-linear.

The non-linear problem of mutuality optimising the calibration parameters as well as weights and sub-set selection is doable is principle, but seem almost intractable computationally for large data sets.

Can anyone suggest an efficient path for achieving both of these aims in an optimised way?

Bogdanovist
  • 6,059
  • 1
  • 23
  • 28
  • 1
    Sorry, I dont know how to answer your question buy I am quite interested on what you mention on re-calibrating the results of random forests using a logistic model. Could you give me some pointers on that subject? Thank you. – JEquihua Feb 20 '13 at 19:46
  • 1
    See the question linked in the second paragraph of the question above. It shows a nice example of the kind of bias in RF predictions that I've often seen, and some good answers. – Bogdanovist Feb 20 '13 at 22:39
  • 2
    See also the discussion here http://www.kaggle.com/c/bioresponse/forums/t/1974/rf-and-boost-tree. There is a link to a nice paper on this at in post #4. – Bogdanovist Feb 20 '13 at 22:41
  • Thank you! By the way Have you had any luck post-processing the results of random forest regression to do better than the mean of the sequence of trees? I looked for the "elastic net" approach in the Elements of statistical... in the chapter on random forests but did not find it. Could you detail this a bit more or share any experience you have had? I'm very very interest in what you describe. Greeetings and thank you. – JEquihua Feb 22 '13 at 00:23
  • @JEquihua we should move this discussion to the chatroom. Look for the room I just created on the sidebar -> – Bogdanovist Feb 22 '13 at 00:44
  • 3
    Sad to not be able to get the rest of the conversation. – EngrStudent Apr 21 '17 at 20:56
  • I think calibration is most often applied to classification. How would you calibrate a regression model? You'd have to assume a distribution for outputs, which is something non-parametric regression (such as Random Forests) actually wants to avoid. – Firebug Oct 28 '20 at 15:07
  • There is this idea in stats of a bias-variance trade-off. You can either have minimum bias, but the uncertainty, the spread of the predictive error is higher; or you can reduce the variance, the spread At the cost of having a higher bias. If you think about the random forest, the cart is about minimizing variance, and the ensemble mean it’s about minimizing the bias. – EngrStudent Nov 11 '20 at 16:33

2 Answers2

1

I want to add some thoughts on the problem at hand, so that the discussion may roll on. However, I propose something else to think about, so others may comment on this.

When reading this post and the post in the highlighted link, we try to overcome the bias in the RF (possibly in the tails) and to correct biased output of the RF, by applying another method, e.g. elastic net. However doing an elastic net first, and then doing the RF or vice versa has not led to a well-received solution. We try to stack methods one after another to overcome their downsides, but with no success.

The Problem may be buried in the procedural usage of two methods after each method gave its word on all features in the equation. I'm not speaking about parallelism here. I mean, what if we gave word for every feature with a function of its own.

When we do a RF to predict the outcome of some function $f(x)$ we try to model all features $x_1 x_2$, the whole equation, all at once with one method. In other words summed trees or the opinion of all trees come to an averaged conclusion on how to deal with all features in the equation. The elastic net does the same when choosing $\lambda$ for regularization. But we are doing it on all features at once, and that means, that one or maybe two features have a downside from this. Someone has to draw the short straw.

Maybe the Find as the previous poster stated is not the real Find, as it wants to find another method to correct the misbehavior. Maybe the real Find lies not in stacking methods after another. What if we can try to give word to every feature by applying one method for every feature. Like $y = a_0 * + a_1*f_1(x_1) + ... + a_n*f_n(x_n)$. I did not heard of it, until I read it myself. I'm talking about the interpret ML package by microsoft. Although at the moment

It only supports trees as base learners (Dr. Robert Kübler)

it may shed some light on the problem.

I just wanted to give some thought that we may have to look for a different way instead of 'correcting' our route.

Paper Microsoft: https://arxiv.org/pdf/1909.09223.pdf

Medium: https://towardsdatascience.com/the-explainable-boosting-machine-f24152509ebb

Patrick Bormann
  • 1,498
  • 2
  • 14
0

It is a decent question, asked a decade ago. The chat is long gone.

Question
Given a random forest:

  • That has bias in the outputs
  • That is decently trained with decent parameters and is otherwise non-pathological
  • has some similarity to this (link)

Find:

  • Elastic net regression, or other regularized regression, to calibrate the output
  • Logistic regression based calibration of the ensemble output
  • A combination of both, if possible, to improve the result.

Analysis: In progress...

EngrStudent
  • 8,232
  • 2
  • 29
  • 82