1

I am trying to fit a model to predict housing prices. My residual plots look like the following:

![enter image description here

Should I be concerned about the large hump for the higher quantiles? Would a transformation on the response variable help?

324
  • 484
  • 2
  • 8
  • Plot of the quantiles of what? – Dave Jul 25 '20 at 14:37
  • Why do you care if the fitted values are normal? – Dave Jul 25 '20 at 14:40
  • @Dave *the residuals – 324 Jul 25 '20 at 14:51
  • What do the other diagnostics say? I am concerned about that hump, but I wonder if it has something to do with the variance increasing for expensive houses. – Dave Jul 25 '20 at 15:11
  • That plot seems sufficient to conclude that the residuals are not normal, but if that is important depends onyour modelling goals. I would investigate other things first, show us a plot of residuals versus fitted, which tells you about constancy of variance. – kjetil b halvorsen Jul 25 '20 at 15:19
  • @Dave I added the other plots – 324 Jul 25 '20 at 17:01
  • @kjetilbhalvorsen I added the other plots above in the question for you to see – 324 Jul 25 '20 at 17:01
  • 1
    Log price is often a much better scale to work on than price. – Nick Cox Jul 26 '20 at 01:33
  • To me, the main feature of relevance regarding the "hump" is that there a number of residuals with similar values (the flattening in the upper right indicates that residuals are similar to one another.) The fact that there is a steep rise before the flattening means that these similar residuals are quite different from the rest of the residuals. This suggests that there may be a omitted variable responsible for such a clustering effect, perhaps an indicator variable. So you might look for that variable, and redo the plot once you find it and put it in your model. – BigBendRegion Aug 11 '20 at 23:17

1 Answers1

0

You have something to think about which is much more important than the normality assumption, and your last plot is helpful.

Residuals vs leverage You have some data points with a very high leverage. If you are not used to leverage, Leverage and Influence. I would guess the points with high leverage corresponds to a high fitted value, which leads to ...

Residuals vs fitted This plot might indicate a variance increasing with fitted vales, and that is what @Dave indicated in his comment ... but for the very highest fitted values the trend of increasing spread is not followed. That can maybe be explained with high leverage of those points (investigate!), which tends to draw the fitted model to themselves, as if high leverage points have a gravitational force. You could also investigate this by some robust fitting.

Normal QQ until you have thought about the first two points, don't even bother to look at this plot.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467