1

I am looking into a question about variance induction on an incremental dataset.

To begin with, dataset $D_{n-1}$ contains elements $\{x_1, ..., x_{n-1}\}$, and we have got the values of:

  • mean $\bar{x}_{n-1}$
  • variance $\sigma^2_{n-1}$

If we add in a new element $x_n$ to get a new dataset $D_{n}$ containing $\{x_1, ..., x_{n-1}, x_n\}$, and assume we have computed its value of:

  • mean $\bar{x}_n$ (e.g. by formula $\bar{x}_n = \frac{n-1}{n}\bar{x}_{n-1} + \frac{1}{n}x_n$)

Then which one option is the variance $\sigma^2_n$? ...

By a Python testing script, I have ruled out all other options and validated that the correct answer is:

$\sigma^2_n = \frac{n-1}{n}\sigma^2_{n-1} + \frac{1}{n}(x_n-\bar{x}_{n-1})(x_n - \bar{x}_n)$

However, I need a little help to prove it analytically.

Let me know if you need more details, and I highly appreciate your help.

James
  • 111
  • 2
  • Does this answer your question? [Derivation of Running(Online) Variance's formula](https://math.stackexchange.com/questions/711135/derivation-of-runningonline-variances-formula). Found using [Approach0](https://approach0.xyz/search/?q=OR%20content%3A%24%5Csigma%5E2_n%20%3D%20%5Cfrac%7Bn-1%7D%7Bn%7D%5Csigma%5E2_%7Bn-1%7D%20%2B%20%5Cfrac%7B1%7D%7Bn%7D(x_n-%5Cbar%7Bx%7D_%7Bn-1%7D)(x_n%20-%20%5Cbar%7Bx%7D_n)%24&p=1). – John Omielan Apr 20 '22 at 05:36
  • FYI, a closely related question is [incremental computation of standard deviation](https://math.stackexchange.com/q/102978/602049). – John Omielan Apr 20 '22 at 05:38
  • Thank you, @john-omielan. Exactly, that's what I need. Thank you for the hint about Approach Zero as well. – James Apr 20 '22 at 06:36
  • You're welcome. Regarding my suggestion being what you need, please click the button that indicates the proposed duplicate question does answer yours. Also, FYI about Approach0, this [answer](https://math.meta.stackexchange.com/a/29267/602049) gives some information about using it. – John Omielan Apr 20 '22 at 06:44

0 Answers0