2

I am trying to build a multiple regression model while partitioning my data into subgroups based on additional set of covariates. While I implemented lmtree() or mob() in the "partykit" package, I tried to understand post-pruning strategies using AIC and BIC criterion, but I need some helps!

In the lmtree(), we can see the functions below:

    "aic" = {
  function(objfun, df, nobs) (nobs[1L] * log(objfun[1L]) + 2 * df[1L]) < (nobs[1L] * log(objfun[2L]) + 2 * df[2L])
}, "bic" = {
  function(objfun, df, nobs) (nobs[1L] * log(objfun[1L]) + log(nobs[2L]) * df[1L]) < (nobs[1L] * log(objfun[2L]) + log(nobs[2L]) * df[2L])
}, "none" = {
  NULL

To understand how these functions cut some child nodes, I first grow a very large tree with control = mob_control(verbose=TRUE, ordinal = "L2", alpha=0.5) and save the results of AIC, nobs, logLik, and df values of each of the nodes (I saved these values to calculate the above AIC function manually):

enter image description here

Then, I fit another lmtree() function with mob_control(verbose=TRUE, ordinal = "L2", alpha=0.5, prune="AIC") to see which child nodes were cut. This results in a smaller tree without the nodes 4,5,8,9,10,11,14,15,19,20,24,25,26,27 from the first large tree.

I tried to calculate the AIC criterion value with the above table, e.g., starting from nodes 19 and 20, comparing node 18. However, as I kept pruning the tree from the bottom, it seems that the calculation in lmtree() are not always correct... Can you clearly explain what are the objfun[2], nobs[2], and df[2] in the AIC and BIC functions? For example, after I cut the nodes 10 and 11, how I can decide to keep the nodes 8 and 9 comparing with node 7?

Thank you so much for your time in advance!

sunmee
  • 23
  • 2

1 Answers1

2

Disclaimer: I can't use your example because it is not reproducible and it isn't clear to me how exactly you have set up the table with the log-likelihoods. It seems that the log-likelihoods are all evaluated at the full parameter values (of the large tree) and not the restricted parameter values in the inner nodes. But, again, I cannot verify this with the information provided.

For a reproducible example consider the following simple example on the cars data:

library("partykit")
m <- lmtree(dist ~ 1 | speed, data = cars, alpha = 0.5, prune = "AIC")
plot(m)

And we extract the models in all nodes of the tree:

ms <- refit.modelparty(m)

Now let's check why the split of node 2 into nodes 3 and 4 is kept. The AIC of the model in node 2 is:

AIC(ms[["2"]])
## [1] 265.2902
-2 * as.numeric(logLik(ms[["2"]])) + 2 * 2
## [1] 265.2902

The AIC of the combined nodes 3 and 4 is:

-2 * as.numeric(logLik(ms[["3"]]) + logLik(ms[["4"]])) + 2 * (2 + 1 + 1)
## [1] 247.7727

Thus, the split improves the model and is not pruned. Note that the parameters of the 3/4 models use two separate means, a single error variance, and one additional estimated breakpoint. One could compute this differently, e.g., with two variance or a different penalty for the additional breaks etc. The partykit package offers a couple of variants for this.

Achim Zeileis
  • 13,510
  • 1
  • 29
  • 53
  • 1
    Thank you so much for your very clear answer! I also saw the equation that you showed here in https://cran.r-project.org/web/packages/partykit/vignettes/mob.pdf, p. 12, but I was wondering the term "nobs[1L]" or "nobs[2L]" in the lmtree() in the partykit package. – sunmee Nov 14 '17 at 15:32
  • For each split these quantities are computed as follows: `objfun[1]`, `nobs[1]` and `df[1]` are simply the objective function, sample size, and number of estimated parameters in the mother node. `objfun[2]` is the _sum_ of the objective functions _across_ daughter nodes. Similarly, `df[2]` is the _sum_ of number of estimated parameters (optionally plus a penalty for the split itself). And `nobs[2]` is the _sum_ of sample sizes (which, of course, is just the sample size of the mother node). – Achim Zeileis Nov 16 '17 at 11:44