I have been trying to apply a linear regression, however, every approach I follow seems to result to the same issue: the homogeneity of variance of the residuals is violated and I have been trying to understand why is that and how to proceed. The dependent variable (maximum speed difference between an observed car and surrounding traffic) is the following:
The goal is to identify the impact of other variables on this speed difference. My explanatory variables include weather conditions (categorical), traffic conditions (categorical), type of vehicle (categorical), acceleration (continuous) and absolute speed (continuous). I have seen in many topics here that the outcome variable does not nescessarily have to follow a normal distribution, tha normality assumption refers to the residuals, so I have been trying to apply a linear regression. The normality assumption seems ok based on the Q-Q plot, however, I think the homogeneity of variance of the residuals is violated:
So, based on this, I have been trying to adjust my model. I checked a subsample without the negative values (it would make sense for my research to try that too), I applied a log transformation and a squared transformation on this subsample. I even tried some different explanatory variables but still the plots are similar. In all cases, the normality assumption of the residuals is ok and there is no multicolinearity problem based on the VIF test. I also tested for highly influential variables with cook's distance and there is not an issue here either.
My questions are:
- Why is this happening? Could it be possible that I should not use simple linear regression in the first place?
- What other approaches can I follow to overcome this issue?