I'm trying to create a prediction model using regression. This is the diagnostic plot for the model that I get from using lm() in R:
What I read from the Q-Q plot is that the residuals have a heavy-tailed distribution, and the Residuals vs Fitted plot seems to suggest that the variance of the residuals is not constant. I can tame the heavy tails of the residuals by using a robust model:
fitRobust = rlm(formula, method = "MM", data = myData)
But that's where things come to a stop. The robust model weighs several points 0. After I remove those points, this is how the residuals and the fitted values of the robust model look like:
The heteroscedasticity seems to be still there. Using
logtrans(model, alpha)
from the MASS package, I tried to find an $\alpha$ such that
rlm(formula, method = "MM")
with formula being $\log(Y + \alpha) \sim X_1+\cdots+X_n$ has residuals with constant variance. Once I find the $\alpha$, the resulting robust model obtained for the above formula has the following Residuals vs Fitted plot:
It looks to me as if the residuals still do not have constant variance. I've tried other transformations of response (including Box-Cox), but they don't seem like an improvement either. I am not even sure that the second stage of what I'm doing (i.e. finding a transformation of the response in a robust model) is supported by any theory. I'd very much appreciate any comments, thoughts, or suggestions.