Does downsampling affect regression results?

Asked Apr 21 '16 at 14:54

Active Apr 21 '16 at 14:54

Viewed 256 times

How is linear regression affected by downsampling the explanatory variable?

To be more precise, I would sort all the values of $x$, and then split into a a number bins with equal number of points in each bin (note that each bin may have a different length). Within each bin, I would take the average of both $x$ and $y$ values. The resulting average $y_{avg}$ and $x_{avg}$ would become the new dataset.

I asked a similar question earlier, but then I assumed $x$ has only a few discrete values.

edited Apr 13 '17 at 12:44

Community

asked Apr 21 '16 at 14:54

max

1,254
1
12
29

One way to think through questions like this is to ponder what would happen when you take your procedure to its logical limit. Consider, then, how you would perform linear regression if (a) you used just one bin and (b) if you used two bins. In particular, think of how you would assess the uncertainties in the parameter estimates. A subtler issue concerns the effects of modifying the regressor values: the usual regression assumptions are that you know the regressors without appreciable error, so what happens now that you have modified their values and made them uncertain? – whuber Apr 21 '16 at 15:46

Does downsampling affect regression results?

0 Answers0

Linked