5

I am using the glmtree function from the partykit package in R.

I would like to know how I can evaluate the models and how I can improve them.

I am growing a big tree (alpha = 0.9) and pruning with AIC as the criterion. I am using the AUC (pROC package) and the results are between 0.62 and 0.79.

fit <- glmtree(fD ~ 1 | Age + fGender + Qualification + fOccupation + SizeWorkplc,
  data = newdata, family = "binomial",
  minsize = 50, maxdepth = 4, alpha = 0.9, prune = "AIC")

prob <- predict(fit, newdata = newdata, type "response")
newdata$prob <- prob
g <- roc(fD ~ prob, data = newdata)
plot(g)

I am really new on this, so I would really appreciate some help.

Achim Zeileis
  • 13,510
  • 1
  • 29
  • 53
Dani
  • 51
  • 2

1 Answers1

4

The strategy you describe looks very reasonable. For evaluation you can use the usual kinds of measures that you employ for other binary classiers (or trees in particular): misclassification rate (or conversely classification accuracy), log-likelihood, ROC, AUC, etc. Personally, I often use the ROCR package but the pROC package you used appears to offer useful tools for this as well.

For improving the model, you might consider whether extending the model part from an intercept (fD ~ 1) to something with a regressor. I would recommend to do so based on subject-matter knowledge which I presume you have for this analysis. If, for example, you suspect that the Qualification effect or the Age effect depends on interactions with the remaining variables than you could use fD ~ Age + Qualification | fGender + fOccupation + SizeWorkplc or something like this. The the choice of the model certainly depends on what you could interpret or which interactions you would want to assess.

Achim Zeileis
  • 13,510
  • 1
  • 29
  • 53
  • Thank you very much, Achim. Do you know how can I get confidence intervals for the `glmtree` predictions? – Dani Jul 10 '15 at 01:46
  • With a little bit of glue code: For example with `data("kyphosis", package = "rpart")` you can fit `m – Achim Zeileis Jul 10 '15 at 20:35
  • Thank you very much Achim for your help. Do you know if I can apply the glmtree to an ordinal variable? If yes, which family should I use? If not, do you know another tree-based model that I could use? Thank you once again! – Dani Oct 15 '15 at 07:11
  • 1
    Ordinal data is not supported by `glmtree()`. We're currently working on a combination of the general `mob()` infrastructure with `polr()` and/or `clm()`. The fitting of the tree is realtively easy and straightforward but we haven't finished all the nice glue code for plotting and predictions etc. If you want to use a constant-fit tree (with only partitioning variables but no regressor variables), I would recommend to use `ctree()` which deals with ordered factor responses. – Achim Zeileis Oct 15 '15 at 09:58
  • That would be great! Do you have any idea when it would be finished? Is it possible to fit a multinomial logistic with the `glmtree()`? I will try the `ctree()`. Thank you very much for your help! – Dani Oct 15 '15 at 12:08
  • Is there any pruning method to `ctree()` for large data sets (similar to AIC/BIC-based post-pruning)? And if I have regressor variables, what would you suggest? – Dani Oct 16 '15 at 02:11
  • No automatic pruning procedure is currently available for `ctree()`. In principle, AIC- or BIC-based pruning could be applied but there is no ready-made function for it. As for the planned extensions: In the `partykit` project on R-Forge (https://R-Forge.R-project.org/R/?group_id=261) there is some _very rough_ code but it will need further work before being officially released. There is no fixed timeline, yet. – Achim Zeileis Oct 16 '15 at 20:09