Log-Log model errors are not normal

Question

I am trying to model sales as a function of various variables (debt, number of employees, competitors etc.). For this I have transformed both dependent and independent variables using natural logarithm.

The problem is the residuals are not normal as indicated by both their plot and the Shapiro-Wilk test.

I imagine that the log transform can also affect the residuals: could this explain their lack of normality?

Other model stats are looking good, R2 adjusted = 0.92, F test is significant, Resid Std Err = 0.5, and the mean of residuals is 0.

Edit:

Size of dataset: N = 4403; 8 variables in the model: 3 continuous, 5 discrete

How big is your data set? Can you show us a Q-Q plot? Rejecting normality is (1) often unimportant and (2) almost inevitable with a big data set. — Ben Bolker, Mar 01 '22 at 21:38
1. *don't* try to interpret a QQ plot without examining the "prior" plots for the fit of the mean and heteroskedasticity (in R residuals vs fitted and scale-location at a minimum), The QQ plot is only interpretable if the fit and conditional variance assumptions are reasonable. 2. If all that's okay and there's no omitted but potentially important covariates/predictors, you might find a log-link gamma GLM (with logged x) is a better fit for the conditional distribution. — Glen_b, Mar 01 '22 at 23:28

score 1 · Accepted Answer · answered Mar 01 '22 at 22:27

Some thoughts:

your residuals are left-skewed (the lower/left-hand tail values are smaller/more negative than expected, the upper/right-hand tail values are also smaller/more negative than expected)
this probably means you are "overtransforming" your data, i.e. a log transform takes a right-skewed distribution of residuals and converts it to a left-skewed distribution (rather than to a symmetric distribution)
you might try a weaker transformation, e.g. square-root, or run a Box-Cox analysis to compare different transformations (?MASS::boxcox or ?car::bcPower in R; if your original data set includes negative values you may have to try one of the alternatives listed at ?car::bcPower)
transforming will also affect the fit of your continuous variables (either improving or worsening the fit, hard to say in advance)
the violation of the assumption of Normality of the residuals may not be a huge problem (see here); in particular it won't lead to bias, although it may lead to inefficiency/poorly calibrated models ...

Thank you very much for these considerations. They are very helpful. — cremorna, Mar 02 '22 at 11:35

Log-Log model errors are not normal

1 Answers1