3

I have two binary classifiers, a logistic regression classifier (returning probabilities) and a decision tree classifier (returning 1 and 0).

I was wondering if it is possible to combine the prediction of both, but I am not sure what is the strategy. Should I somehow try to find the probabilities from the decision tree classifier and multiply them between them or maybe get the mean predictions divided by two?

Or should I perhaps change the logistic regression probability outputs to classifications (0,1) and try to sort it out somehow this way? I think the latter sounds wrong to me. I assume we would like to use the probabilities, i just don't know how to get the probabilities out of the decision tree classifier :/

By the way I am running this in R using the glm and tree functions repspectively

Cooli
  • 31
  • 1

2 Answers2

3

When you combine models it's called an Ensemble model. Ensembles use voting, weighting, and or averaging, and there are different algorithms.

(Averaging) Most decision tree functions will give you the probability. You could just average the two.

(Boosting) Lets say your logistic regression model performs better with the bottom 50% of values of a continuous independent variable, and your decision tree performs better with the top 50% of values of that continuous variable. Your ensemble could choose stronger model for that scenario.

(Voting) As a different example, if you had three models, you could use simple voting, and not use an analysis of the training data (which was done above).

In response to the comment, I've addressed the "how are models combined" with three examples, and provided the term "ensemble." The "should I" is a different question, which will depend.

Kyle
  • 273
  • 1
  • 10
  • I recommend that you read: J. Kittler, M. Hatef, R. P. W. Duin and J. Matas, "On combining classifiers," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, March 1998. You can download this paper freely from internet. – Match Maker EE Aug 26 '20 at 09:09
3

Make your classification tree algorithm output probabilities, not hard 0-1 classifications. See here on the rationale, quite independently of your ensembling situation.

Then you have two probabilistic classifiers. Simply combine the probabilistic predictions within each class by averaging, possibly using weights.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • Suppose a leaf node has few counts, then the probabilities calculated for that node would change a relatively large amount for a small change in the zero and one “distribution” in that node, hence presumably care must be taken not to overfit. Whereas is logistic regression more smooth here for changes in regressor? Further for predictions is it rare to use a decision tree singly without going the random forest or boosting route? – Single Malt Aug 25 '20 at 20:27
  • @SingleMalt: all very correct. If we use a single decision tree, then we should of course guard against overfitting, e.g., by pruning. (And so should we in logistic regression, e.g., by regularizing, like by using an Elastic Net.) But ensembling is already a good step in the right direction. – Stephan Kolassa Aug 26 '20 at 06:30
  • Understood. Logistic regression may seem to have stable transitions of probabilities (when simplistically thinking of the sigmoid graph), but a large regressor coefficient may create large changes in probability for small changes of that regressor. Hence the need for regularization to mitigate this. – Single Malt Aug 26 '20 at 07:03