1

How is linear regression affected by downsampling the explanatory variable?

To be more precise, I would sort all the values of $x$, and then split into a a number bins with equal number of points in each bin (note that each bin may have a different length). Within each bin, I would take the average of both $x$ and $y$ values. The resulting average $y_{avg}$ and $x_{avg}$ would become the new dataset.

I asked a similar question earlier, but then I assumed $x$ has only a few discrete values.

max
  • 1,254
  • 1
  • 12
  • 29
  • One way to think through questions like this is to ponder what would happen when you take your procedure to its logical limit. Consider, then, how you would perform linear regression if (a) you used just one bin and (b) if you used two bins. In particular, think of how you would assess the uncertainties in the parameter estimates. A subtler issue concerns the effects of modifying the regressor values: the usual regression assumptions are that you know the regressors without appreciable error, so what happens now that you have modified their values and made them uncertain? – whuber Apr 21 '16 at 15:46

0 Answers0