0

When fitting a tree regressor model, I would like to calculate the AIC and BIC metrics. However I need the maximum of the likelihood function to do this.

Is there a closed form solution or some other way to calculate the likelihood function from a tree regressor? I haven't been able to find any information online, other than a closed form solution in an OLS framework.

PyRsquared
  • 1,084
  • 2
  • 9
  • 20

2 Answers2

2

A regression tree is still a linear model (if you define the correct interaction terms). So in principle it is possible to calculate AIC and BIC with the OLS formula.

Sebastian
  • 2,733
  • 8
  • 24
  • Well, a regression tree is not a linear model. You use it to regress data that does not have a linear relationship with the target as an alternative to OLS which assumes linear data. And using the formula in the link, $k$ is the number of parameters in the OLS model, which is easy to find out. But with a tree regressor it isn't obvious how many parameters there are (number of trees? avg number of nodes per tree?) – PyRsquared Aug 17 '20 at 08:01
  • 1
    A (single) regression tree is certainly a linear model (make a simple example and convince yourself that you can define the correct interaction terms). – Sebastian Aug 17 '20 at 11:04
2

To compute the BIC or AIC for a model, the observed dataset has to have an associated conditional distribution. For instance,

  1. In a linear regression, a dataset $\mathcal{D} = \{(t_n, {\bf x}_n) \vert t_n\in\mathbb{R}, {\bf x}_n\in\mathbb{R}^M\}$ is assumed to be conditionally distributed as

$$ t_n\vert {\bf x}_n\sim\mathcal{N}({\bf w}^T{{\bf x}_n}, \sigma^2) $$

  1. In a logistic regression, a dataset $\mathcal{D} = \{(t_n, {\bf x}_n) \vert t_n\in\{0,1\}, {\bf x}_n\in\mathbb{R}^M\}$ is assumed to be conditionally distributed as

$$ t_n\vert {\bf x}_n\sim\text{Blli}(\sigma({\bf w}^T{{\bf x}_n})) $$

  1. In an ARCH(1) model, a dataset $\mathcal{D} = \{t_n \vert t_n\in\mathbb{R}\}$ is assumed to be conditionally distributed as

$$ t_n\vert t_{n-1}\sim\mathcal{N}(0, \sigma(t_{n-1})) $$

And so on...

A classical decision tree, however, does not assume a conditional distribution for the data. There is no associated likelihood function, hence BIC cannot be computed.

If you wanted to compute the BIC, you'd need to assign to your model some sort of likelihood function.

  • So just append a distributional assumption and you have it. The real trick is figuring out the number of parameters in the tree model. – BigBendRegion Aug 17 '20 at 12:37