When we analyze data , we can observe several variables that may contain mutual information. For an example , There can be a binary variable such as Y=Have you ever smoke ? And then there will be a follow up question such that (in this case it is a continuous variable) How old when you first smoke ?
For the variable X=that measures How old when you first smoke ? ,
$X_1$= {$x_1$=0 ; if never smoke, $x_1$=1 ; if smoke }
$X_2$ = {$x_2$=0 ; if $x_1$=0 $x_2$ >=0 ; if $x_1$=1 }
So the distribution of $X_2$ will be like this :
That means it contain several zeros since it is depends on the previous question ($X_1$)
One way to deal with this type of problem is calculate the age of first smoke ($X_2$) only for users. (i.e eliminating zeros) . Then the drawback is that it will reduce the sample size with respect to $X_2$ variable.
Another way to model $X_2$ is convert it to a categorical variable. For an example someone can do like this:
$X_2 categorized$ ={"Never Smoke" ; $X_2$=0 , "Young" ; 0< $X_2$ <=15 , "Middle" ; 15< $X_2$ <=20 , "old" ; $X_2$>20}
But Is there way to Model X by preserving the continuous nature using a mixture distribution ? Mixture distribution in the sense that ,this may be something like the product of $X_2$ and $X_1$. However I am not sure how to do this .
Since in this case $X_1$ is binary , taking the product of $X_2$ and $X_1$ seems to make sense. But I am not sure how this will work in general , i.e when $X_1$ has more than 2 categories.
Any help would be great