For the task of churn modelling I was considering:
- Compute k clusters for the data
- Build k models for each cluster individually.
The rationale for that is, that there is nothing to prove, that the population of subscribers is homogenous, so its reasonable to assume that data-generating process may be different for different "groups".
My question is, is it an appropriate method? Does it violate anything, or is it considered bad for some reason? If so, why?
If not, would you share some best practices on that issue? And another question: is it generally better or worse to do pre-clustering than model tree (As defined in Witten, Frank - classification/regression tree with models at the leafs. Intuitively it seems that decision-tree stage is just another form of clustering, but I don't know whether it has any advantages over "normal" clustering.).