My dataset contains a lot of variables that appear to me as practically categorical on a continuous scale to differing degrees.
Many have a large chunk of zeros or specific value followed by one or more apparent separate chunks. In some cases this is obvious where there are literally 2 specific single effectively on/off. Others are much more potential candidates where there are almost 2 or more separate distributions.
I am trying to model a continuous normally distributed dependent on a number of potential variables (collected on a continuous scale). Most of these are likely not to contribute to the model. I will be using various modelling methods to explore what is best (i.e. I will be trying tree methods where the apparent binomial appearance isn't a problem). I am not assuming a good model can be produced.
In these situations is are there any hard and fast rules/techniques as to whether to categorise or not? Also having potentially performed the transformations what considerations/measures might I have to be aware of? I would say most of the dataset is like this.