5

Very basic statistics question: should the average prediction from a regression model equal the average value of the dependent variable using the same data?

For example, if I collect data on the height and weight of 1,000 adults, then run a regression of height on weight, then predict height given the same sample's weights, should the mean of those predictions be the same as the mean of the sample's actual weights?

user7340
  • 403
  • 5
  • 12
  • 3
    In the bivariate case, this result follows naturally from a geometric analysis, which might provide more insight than a calculation. See the "Conclusions" section in my answer at http://stats.stackexchange.com/a/71303. – whuber May 13 '15 at 17:15

1 Answers1

10

For a linear model with an intercept term, yes. This is because the solution satisfies:

$$ X^{t} X\beta = X^{t} y $$

This is a system of equations. The very first row in $X^{t}$ is all ones, so the first equation is:

$$ \sum_i \sum_j x_{ij} \beta_j = \sum_i y_i $$

Which reads, sum of predictions equals sum of response.

Matthew Drury
  • 33,314
  • 2
  • 101
  • 132