0

I have a heavily positive distribution in my dataset of my x variable, the aim to put the variable into 4 categories of categorical variable. Although, my initial idea is to log the continuous data then split it into categories. However, lots of values of the dependent variable are 0 which then would require a constant in the log transformation. Is it appropriate to just split a positive skewed dataset into a categorical variable rather than transforming the data first.

Regards, Otis

  • Check out `hurdle` and `zeroinfl` in the pscl package. – G. Grothendieck Dec 26 '20 at 19:07
  • Why do you want to put your x variable into categories? That's [not generally a good idea](https://stats.stackexchange.com/q/68834/28500). Please provide your reasons for categorization by editing your question, as comments are sometimes ignored and can be lost. – EdM Dec 26 '20 at 19:51
  • Also, if x is categorical there is no need to take the log of y. – Jonathan Dec 26 '20 at 20:00

0 Answers0