1

I have a heavily positively skewed continuous outcome variable y. Therefore I log transformed it to achieve normality.

My interest is to assess the association between a few explanatory variables (one binary variable (Yes/No), one ordinal variable (0,1,2,3,4), two count variables) on y. I used generalised linear regression and treated the binary and the ordinal variables as categorical variables IN Proc GLM in SAS. The two count variables have many zeroes. The R squared is extremely small (~20%). The residual doesn't indicate severe departure from normality. Multicollinearity was assessed by using tolerance, variance inflation index and condition index. All this measures are around 1 so collinearity is not a problem here as far as I understand. However, there is an indication of heteroscedasticity from the residual vs predicted plot. As the predicted value increases, the variability of the residuals gets smaller.

Is being homoscedastic very important in my case? If so, how can it be resolved? I tried following the steps in this document, but the heteroscedasticity still presents.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Wong
  • 11
  • 2
  • 1
    What is your Y variable? Did you transform it because the unconditional distribution of Y was skewed, or because the conditional distribution of Y (the residuals) was skewed? – gung - Reinstate Monica Jun 05 '16 at 15:53
  • This is a clear description. It sounds like you may have used a transformation that was too strong. What does a [spread-vs-level plot](http://stats.stackexchange.com/search?q=spread+level+plot) of the original data indicate? (Theory and references are given at http://stats.stackexchange.com/a/66038/919. A good example in a regression context appears at http://stats.stackexchange.com/questions/52089/what-does-having-constant-variance-in-a-linear-regression-model-mean/52107?s=11|0.1356#52107.) – whuber Jun 05 '16 at 16:09
  • Hi @whuber. Thank you. The Box-Cox method from proc transreg in SAS identifies lambda=0 that is natural log to be appropriate transformation for y. The studentized residuals vs predicted plot shows a decreasing trend and the variability of studentized residual decreases as the predicted value increases. (How can I insert plot to illustrate my problem?) – Wong Jun 06 '16 at 10:55
  • Hi @gung . Thank you. Yes, the unconditional distribution of y is skewed. If I didn't log transform y, the predicted vs residual plot and the residual vs quantile plot will look worse in terms of heteroscedasticity and non-normality. – Wong Jun 06 '16 at 11:06
  • *All* methods to identify Box-Cox transformations must be understood as guidance only. It is imperative that you use your judgment to modify the initial transformation to suit your needs. Your residuals vs. predicted plot provides evidence that $\lambda$ should not have been so low. Diagnostic plots, such as a spread vs. level plot, provide far more insight than automatic procedures like TRANSREG and will give you an indication of how to modify $\lambda$ to a more suitable value. – whuber Jun 06 '16 at 13:25
  • Hi @whuber. Thank you. I am SAS user and managed to find the spread-level SAS macro https://github.com/friendly/SAS-macros/blob/master/sprdplot.sas. When I ran it with my data, it gave me error. I'm still trying to figure out why. – Wong Jun 08 '16 at 06:12
  • Hi @whuber. I managed to run the macro without errors now. It gives slope:0.45 and power=0.5. What transformation does it mean? Thank you. – Wong Jun 08 '16 at 08:46
  • The 0.5 power is the square root. – whuber Jun 08 '16 at 13:53

0 Answers0