To compute the BIC or AIC for a model, the observed dataset has to have an associated conditional distribution. For instance,
- In a linear regression, a dataset $\mathcal{D} = \{(t_n, {\bf x}_n) \vert t_n\in\mathbb{R}, {\bf x}_n\in\mathbb{R}^M\}$ is assumed to be conditionally distributed as
$$
t_n\vert {\bf x}_n\sim\mathcal{N}({\bf w}^T{{\bf x}_n}, \sigma^2)
$$
- In a logistic regression, a dataset $\mathcal{D} = \{(t_n, {\bf x}_n) \vert t_n\in\{0,1\}, {\bf x}_n\in\mathbb{R}^M\}$ is assumed to be conditionally distributed as
$$
t_n\vert {\bf x}_n\sim\text{Blli}(\sigma({\bf w}^T{{\bf x}_n}))
$$
- In an ARCH(1) model, a dataset $\mathcal{D} = \{t_n \vert t_n\in\mathbb{R}\}$ is assumed to be conditionally distributed as
$$
t_n\vert t_{n-1}\sim\mathcal{N}(0, \sigma(t_{n-1}))
$$
And so on...
A classical decision tree, however, does not assume a conditional distribution for the data. There is no associated likelihood function, hence BIC cannot be computed.
If you wanted to compute the BIC, you'd need to assign to your model some sort of likelihood function.