[NB: See update 1 below.]
I find that the methodology for rpart
is far easier to explain than party
. The latter, however, is much more sophisticated and likely to give better models. The way I sometimes explain party
is to speak of it as basis for producing local linear (or GLM) models. I build up to this by pointing out that the results for rpart
are constant across all elements that fall into the leaf node, i.e. the box/region bounded by the splits. Even if there might be improvements via local models, you don't get anything but a constant prediction.
In contrast, party
develops the splits to potentially optimize the models for the regions. It is actually using a different criteria than model optimality, but you need to gauge your own capacity for explaining the difference to determine whether you can explain it well. The papers for it are pretty accessible for a researcher, but may be quite challenging for someone not willing to consider simpler methods like random forests, boosting, etc. Mathematically, I think that party
is more sophisticated... Nonetheless, CART models are easier to explain, both in terms of methodology and results, and these provide a decent stepping stone for introducing more sophisticated tree-based models.
In short, I would say that you have to do rpart
for clarity, and you can use party
for accuracy / peformance, but I wouldn't introduce party
without introducing rpart
.
Update 1. I based my answer on my understanding of party
as it was a year or two ago. It has grown up quite a bit, but I would modify my answer to say that I'd still recommend rpart
for its brevity and legacy, should "non-fancy" be an important criterion for your client/collaborator. Yet, I would try to migrate to using more functionality from party
, after having introduced someone to rpart
. It's better to start small, with loss functions, splitting criteria, etc., in a simple context, before introducing a package and methodology that involve far more involved concepts.