0

I have transformed my explanatory variables to a normal distribution as these variables include, proportions (logit transformed) and non normally distributed data (various transformations). The dependent variable is a count, hence I am using "family = poisson". I have read in some sources that it is uneccessary to transform the explanatory data but I don't understand the reasoning behind this.

Was transforming this data incorrect?

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
user13641
  • 175
  • 2
  • 10

1 Answers1

1

Regression models condition on the right-hand-side (predictor) variables and make some sort of distribution assumption for the response variable conditional on all the $X$s. What matters for $X$ is the shape of its effect on $Y$, hence the popularity of regression splines. The only time I worry about the marginal distribution of an $X$ is when it is extremely skewed, because on the average such $X$s predict $Y$ better if they are "straightened out" a bit. I often fit a restricted cubic spline in the cube root of such an $X$.

So the short answer is that transforming the $X$s for the reasons you provided is not correct. And be sure to worry a bit more about the Poisson distributional assumption.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
  • Thank you Frank, I'm afraid your response went largely over my head. I carried out the transformations/analysis on the advice of someone else but this appears to be incorrect. I have only been introduced to GLMs in the last two months so my knowledge is VERY limited, could you recommend any basic texts/sources? – user13641 Oct 18 '14 at 15:26
  • There are so many resources for linear models that it's hard to know where to begin. I hope that others will respond with book recommendations. – Frank Harrell Oct 18 '14 at 16:36