3

a) For which of the following observations (obs1, obs2, obs3) is the variance of the residual the largest? Which observation has the highest leverage? And which one the smallest? Explain why.

b) Which of the three observations has the largest DFBETAS?

c) About which observation would you worry the most?

plot

I thought that the answer to a) would be obs 1, because I think that will have the highest leverage because the distance from the cloud is the greatest (not quite sure whether that is true though). The rest of the questions I had no clue about. Could anyone please help?

aewrwaer
  • 31
  • 3
  • We welcome questions like this, @aewrwaer, but we treat them differently. We try to provide hints to get you unstuck. To better understand the process, you should read the [wiki](http://stats.stackexchange.com/tags/self-study/info) for the `[self-study]` tag. – gung - Reinstate Monica Apr 15 '14 at 13:31
  • You may get some initial hints from my answer here: [Interpreting plot.lm()](http://stats.stackexchange.com/a/65864/7290). – gung - Reinstate Monica Apr 15 '14 at 13:33
  • @gung I have read your answer. Am I correct to conclude that since the absolute value of the $x$ of observation 1 is the largest, that observation will have the highest leverage? And that therefore the variance $(\sigma^2(1-h_{jj}))$ of that observation is the smallest? I am still not sure though in that case about questions b) and c): does a higher leverage autmatically mean a larger DFBETAS? – aewrwaer Apr 15 '14 at 13:45
  • About leverage: try to imagine how the regression line would look like. Then remove one point at a time and draw the regression line again. Does the fitted line move a lot? – coffeinjunky Apr 15 '14 at 13:52
  • @user2378649 Hmmh, maybe it wouldn't move a lot then for observation 1, as that seems to be quite in line with the rest of the obervations. More so for 3 and especially 2. So that would then mean that observation 2 has the highest leverage, then 2, and then 1? Am I correct? – aewrwaer Apr 15 '14 at 13:55
  • @aewrwaer, I updated my answer there to include the actual values. It may be more helpful now. Eg, high leverage does not *necessarily* mean low abs(standardized residual). In addition, although I didn't include DFBETAS, you can think of it similarly to Cook's distance; the reason the predicted values, y-hats, will change is b/c the betas change. – gung - Reinstate Monica Apr 15 '14 at 15:57
  • @gung Thank you very much for all the effort. In my case, would I be right to conclude that observation 2 has the highest leverage, then 3, then 1? Also, is it true that 2 will have the highest dfbetas? – aewrwaer Apr 15 '14 at 17:29
  • Your `obs`'s are fairly similar to the "special points" in my other answer. Which `obs` corresponds to which "special point", & which "special point" has the highest leverage? – gung - Reinstate Monica Apr 15 '14 at 17:44
  • @gung I would say that in the terminology that you used in your answer in my case obs 1 would correspond to: 'high leverage, low residual', obs 2 to: 'high leverage, high residual', and finally: obs 3 to 'low leverage high residual', so that then 2 also has the highest dfbetas. Am I right? – aewrwaer Apr 15 '14 at 17:57
  • That's right, @aewrwaer. Notice that in my case, I used the same x value for both of those. In your case, it looks like x_i-bar(x) differs for your `obs`'s, so which 1 will have the highest leverage? – gung - Reinstate Monica Apr 15 '14 at 19:25
  • @gung Observation 1 has the highest leverage right? But it's not an outlier so it does not have the highest dfbetas I would say. So I think 2 has the largest DFBETAS. Is that correct? Thanks for all the help :)! – aewrwaer Apr 15 '14 at 20:13
  • So which `obs` then, would you worry the most about & why? – gung - Reinstate Monica Apr 15 '14 at 21:43
  • @gung Observation 2 I would say? As 1 is not an outlier even though it has a high leverage. And because 2 has the highest dfbetas. – aewrwaer Apr 16 '14 at 08:02

0 Answers0