Is there a benefit to splitting the data by gender or age range when building predictive models?

Question

Assume we had a set of data that contained thousands of samples with the following information: gender, age, height, weight, country.

Now, suppose we wanted to build a model for predicting people's heights based on gender, age, weight, and country.

It is clear that in general the mean female height will be a few inches smaller than the mean male height. Is there any benefit to splitting the data by gender and building two separate predictive models (one for men, one for women) in this situation?

In terms of age, we know that, roughly speaking, height will increase from age 0-20 before stabilizing until, say, around 60 years of age, at which point it will slowly decrease.

So we could split the data into age ranges 0-10, 10-20, 20-30, etc., and create a predictive model for each category. Is there any benefit to doing this? Or would it actually be disadvantageous?

In general I am asking about whether we should split the data and build separate models when we have predictors that feature well-known specific patterns. Or will predictive performance be better if we only build a single model that uses all of the data?

score 1 · Accepted Answer · answered Sep 07 '20 at 16:56

Unless you have a huge data set it is probably better to build one flexible model, see this Separate Models vs Flags in the same model very similar question where details are given. Also, binning is probably not a good idea, nowadays better to spline the variable, see Why should binning be avoided at all costs? and its links.

In general I am asking about whether we should split the data and build separate models when we have predictors that feature well-known specific patterns. Or will predictive performance be better if we only build a single model that uses all of the data?

In most cases it will be better to build a single flexible model.

Is there a benefit to splitting the data by gender or age range when building predictive models?

1 Answers1

Linked