Reasoning for tranforming linear model to log-level

Question

I have a multiple linear regression model and found that my error terms are not normally distributed. When looking at the histogram of the dependent variable, it looks like below.

I am not sure how to proceed - what kind of transformation would reasonably be appropriate? I tried it with log-level model (take the log of the dependent variable) and found that all assumptions are fulfilled except for heteroskedasticity - for which I could create robust standard errors in the final model. However, I do not understand why taking the log would make any sense when looking at the distribution of data - as it is not skewed.

Other transformations that I tried (square or log independent variables that are skewed) did not solve the problem of non-normal distributed error terms.

How would you proceed and with what reasoning? Thanks!

EDIT:

Also adding the graph of the error terms. Result of Shapiro-Wilk test for residuals was W = 0.99051, p-value = 0.07358.

The assumption about normality is about the error term, not about the values of the response variable. What is the plot of your errors? — Dave, May 18 '20 at 15:03
@Dave Just added the graph about distribution of error terms — Schoguan, May 18 '20 at 15:26
I agree with @mdewey . There may be a reason to transform the variables, but the residuals are fine. — Peter Flom, May 19 '20 at 15:30
Thank you! Would you trust in this case more the results from the Shapiro-Wilk test of residuals or from the graphical analysis? And why so? — Schoguan, May 20 '20 at 07:42
Found this post as an answer to my question: https://stats.stackexchange.com/questions/284033/qq-plot-looks-normal-but-shapiro-wilk-test-says-otherwise — Schoguan, May 20 '20 at 09:05

score 0 · Answer 1 · answered May 18 '20 at 15:05

0

Instead of trying to make the data fit the model, I suggest getting a model that fits the data. Instead of OLS regression, you could try a method that does not make assumptions about the error term, such as quantile regression or robust regression or perhaps some sort of regression tree.

answered May 18 '20 at 15:05

Peter Flom

94,055
35
143
276

Thanks Peter. To be able to compare my model with those published in research that I am using in my study, it would be good to start with an OLS regression first though - and then thinking about alternatives. – Schoguan May 18 '20 at 15:09

Reasoning for tranforming linear model to log-level

1 Answers1