If you had to split a continuous (independent) variable ranging from 0 to 9 and reflecting the number of x (e.g. number of cigarettes smoked), would you rather do:
Median split (but then also eliminating any difference between 0 and 1 cigarettes smoked)
Groups based on the frequency distribution
Groups consistent with theory (e.g. 0, 1-3, >3, but then having a larger N in the high risk group (n>3) than in the low risk one (n = 0): 15%, 35%, 50%)
What would be the best option? Or, what would you do instead?