I am trying to build a multiple regression model, and many of my variables looks like this (histogram for time spent in the system).
The reason I had such data is because zero is actually represents another business case: customer created the account but never used it.
How should I user this types of the variables in a regression model? I have some ideas to do the preprocessing, are they valid? what else can we do?
- Idea 1, replace zero with median value of non-zero ones.
- Idea 2, create another indicator column on zero values, then replace zero with median value of non-zero ones.