1

Let's say I run a simple OLS of y on x. Then I average out all values of y that correspond to the same x, and run the regression again.

Should the results of the two regressions differ? If so, why?

The dataset is many thousands of points. The number of data points with the same x is identical for all values of x (otherwise of course the second regression will change the effective weights of the points).

max
  • 1,254
  • 1
  • 12
  • 29

1 Answers1

2

The coefficient estimates will be the same (as you've already seen in your example).

However, the standard errors of those estimates will not be the same; as a result anything that depends on those standard errors, or indeed other variance estimates -- including $R^2$ and F-statistics) will also be affected.

If you're only interested in coefficient estimates, then there's no harm in replacing equal-sized groups by group-means, but if you want those other things, you'd also need to keep the common group-size and group-variances (or at least the common estimate of variance)

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • What if x is never the same, so I would use bins to group the data before averaging? With sufficiently dense data, the coefficients should still be roughly the same, right? – max Apr 21 '16 at 13:52
  • This is not a clarification of the answer to the question you asked that you're now seeking, but an answer to a new (and more involved) question. – Glen_b Apr 21 '16 at 14:07
  • I asked the [new question](http://stats.stackexchange.com/questions/208616/does-downsampling-affect-regression-results) on that. – max Apr 21 '16 at 14:54