11

How do you show that the point of averages (x,y) lies on the estimated regression line?

csgillespie
  • 11,849
  • 9
  • 56
  • 85
Justin Meltzer
  • 357
  • 1
  • 3
  • 7
  • What have you done so far? –  Nov 02 '10 at 01:30
  • I don't know where to start. – Justin Meltzer Nov 02 '10 at 01:38
  • I know the simple regression line is y=B0 + B1x – Justin Meltzer Nov 02 '10 at 01:39
  • 1
    And I guess by the average x and y is E(x) and E(y), not sure how to link that to the regression line though. – Justin Meltzer Nov 02 '10 at 01:40
  • The regression line is the line that minimizes the sum of squared errors. Knowing that, and a basic knowledge of calculus, find the values of B0 and B1 that minimize that sum of squared errors. The rest requires a little bit of high school level algebra. – Christopher Aden Nov 02 '10 at 02:06
  • Starting with the definition is good. You can carry out the demonstration without calculus, though: if the regression line does not go through the point of averages, you can raise or lower it by some amount to make it pass through that point. In so doing, you will lower the sum of squares of residuals by *n* times the square of the amount of raising or lowering (which is a one-line algebraic calculation). If the line was a least-squares line the sum of squares cannot be lowered, demonstrating the shift must have been zero. – whuber Nov 02 '10 at 05:52
  • @Justin: Are you doing regression or correlation? In the former, the x's are fixed (not random variables) so one usually does not refer to E[x]. Moreover, y is presumed to depend on x, so E[y|x] makes sense, but E[y] would not usually be considered. – whuber Nov 02 '10 at 05:54
  • Easy-to-follow explanation on Khan academy: https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/more-on-regression/v/squared-error-of-regression-line – Yair Jul 12 '18 at 04:45

1 Answers1

17

To get you started: $\bar y = 1/n \sum y_i = 1/n \sum (\hat y_i + \hat \epsilon_i)$ then plug in, how the $\hat y_i$ are estimated by the $x_i$ and you're almost done.

EDIT: since no one replied, here the rest for sake of completeness: the $\hat y_i$ are estimated by $\hat y_i=\hat \beta_0 + \hat \beta_1 x_{i1} + \ldots + \hat \beta_n x_{in} + \hat \epsilon_i$, so you get $\bar y = 1/n \sum \hat \beta_0 + \hat \beta_1 x_{i1} + \ldots + \hat \beta_n x_{in}$ (the $\hat \epsilon_i$ sum to zero) and finally:
$\bar y = \hat \beta_0 + \hat \beta_1 \bar x_{1} + \ldots + \hat \beta_n \bar x_{n}$. And that's it: The regression line goes through the point $(\bar x, \bar y)$

psj
  • 991
  • 7
  • 12
  • 2
    nice answer! just one note: in your edit, you defined $\hat{y_i} = \hat{\beta_0} + \hat{\beta_1}x_{i1} + ... + \hat{\beta_n}x_{in} + \hat{\epsilon_i}$. Actually, $\hat{y_i}$ should not include the residual $\hat{\epsilon_i}$, which you had already included inside the summation $\sum(\hat{y_i} + \hat{\epsilon_i})$ in your original answer. – Roberto Tatis Muvdi Oct 02 '20 at 09:30