I'm a bit stuck with a problem here and any kind of help would help a lot :)
Just to give a clue about my data. I have 6 independant variables (IV) which are:
- $X_1$ = Population -within a block-
- $X_2$ = Households -within a block-
- $X_3$ = Total Rooms -aggregated-
- $X_4$ = Total Bedrood -aggregated-
- $X_5$ = Median Income
- $X_6$ = Ocean Proximity [Categorical]
and my dependant variable (DV) is $Y$ = Median House Price.
I ran a regression including all IVs but there is a violation for almost all OLS assumptions in addition to huge multicollinearity. Here are the residual plot and normality plot before any adjustments.
What I did then is transform all my IVs and my DV using the Box-Tidwell method which I think it is not the correct way of solving the issue so this is my first question what can I do to solve the normality issue?
The other problem is that even after transforming all variables I still have an issue in my residual plot which is the main problem here. I have a linear pattern on the graph that I don't know how to solve. I run my DV against each IV separately and still have the same issue. Here is the graph for the transformed model.