I have a data set ready for multiple regression analysis that consist of apples and oranges. Let's say the depended variable is fruit size and there is a bunch of independent variables (categorical & continuous). I wonder how to analyze it best.
I may account for the effect of being an apple on an orange by a dummy variable, add interactions or treat apples and oranges as random effect (random intercept and/or random slope). Then, I may compare a set of candidate models via R², AIC or BIC and do textbook like model validation, ... .
However, I may also analyze apples and oranges in two separate models, by splitting up the dataset into two subsets. Furthermore, I may split up my data set even further by distinguishing between apple and orange varieties, and/or by distinguishing between country of origin, ... . I may end up in a total of $n$ samples to be analyzed separately.
By splitting up the data set I trade-off between sample size (with need of extra categorical variables) and more consistent (similar) populations, but smaller sampels.
I wonder how to decide on whether to split the sample or to analyze it in a one fits all observations regression model.
And, if I decide to split the sample, should I split it again (based on other categorical variables). What would be the best number ($1, \dots, n$) of separate samples taken from my total sample for conducting separate regression models?
Are there any general decision criteria or rules (of thumb)?