incremental $R^2$ update at every new sample

Question

I am sampling from a random process $X$ and I would like to calculate $R^2$ for the cumulative sum of the samples: $$x_1,..x_n$$ $$y_n=\sum_0^n x_i$$

$$R^2_n=RSQ( [1,2,...n], [y_1,y_2,..,y_n])$$

The calculation becomes increasingly slow as $n$ grows. Do you know any incremental way to update $R^2$ at every new sample, without recalculating it from the beginning every time?

You really don't want something as badly behaved as that formula. That's frequently a disastrous way to calculate variance. There are much more stable ways to calculate variance. Note that R^2 can be written as a ratio of two sums of squares — Glen_b, Oct 01 '19 at 12:41
thanks @Glen_b I edited away the analogy of incremental variance calculation — elemolotiv, Oct 01 '19 at 12:46
You could adapt the online updating approach [here](https://stats.stackexchange.com/a/410471/805), but instead of calculating $r$, calculate its square; i.e. $r^2 = \frac{N_{n+1}^2}{D_{n+1}E_{n+1}}$. You can speed it up further than that (e.g. by using ideas from [Welford's algorithm](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm); and the equivalent for [covariance](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online) and taking advantage of the simple form of the 1,2,3... values & hence their mean and sum of squares),...ctd — Glen_b, Oct 01 '19 at 13:08
ctd ... but you'll probably find that first method sufficient just applied directly to the y's and the 1,2,3... values — Glen_b, Oct 01 '19 at 13:21
I'm in two minds whether it counts as effectively a duplicate of that first link or whether there's enough in the special structure of this problem to leave it. — Glen_b, Oct 01 '19 at 13:29
See [Efficient online regression](https://stats.stackexchange.com/questions/6920/efficient-online-linear-regression/6923#6923) for how to update *everything* efficiently. — whuber, Oct 01 '19 at 14:50
I'm having trouble understanding what your $R^2$ refers to. $R^2$ measures the goodness of fit of predictions to known target values. But, what are your target values, what are your predictions, and how are you generating them? — user20160, Oct 01 '19 at 15:35
@Glen_b thanks for the comments, there is enough info to work on. Up to you whether to keep or discard my question. I have saved the links in your comments — elemolotiv, Oct 01 '19 at 17:23
@user20160 When there's only one predictor for a linear regression model $R^2$ will simply be the squared correlation between the two series of values. One needn't even identify which is the DV and which is the IV to calculate the correlation between them; the question offers enough information (identifying the two series) to answer the question. — Glen_b, Oct 01 '19 at 23:42
@Glen_b True, but the question didn't mention anything about linear regression; I had hoped the OP could be more explicit since this doesn't hold for nonlinear models. In any case, I guess it doesn't matter much at this point. — user20160, Oct 02 '19 at 00:15
On the other hand, R^2 doesn't really make so much sense for a nonlinear model; I'd expect that would be explicitly mentioned (and defined) if it were the case; secondly the OP mentioned `RSQ` which is an [Excel function](https://docs.microsoft.com/en-us/office/troubleshoot/excel/statistical-functions-rsq) which does the squared-correlation calculation I discussed. — Glen_b, Oct 02 '19 at 00:26

incremental $R^2$ update at every new sample

0 Answers0