0

Having 2 sets, and only this data for each one, the running variance, the sum, the running mean and the count. How can I get the merged variance of the 2 sets?

EDIT:

The values of the sets are being updated each time with new ocurrences this ocurrences are not being stored.

The values of the sets are not equal.

I need to merge this 2 sets and get the new variance of this merged set.

EDIT 2:

I think that what I need is the pooled variance, am I correct?

In java would be something like this

Double variance = (((firstAggregate.count - 1) * firstAggregate.variance) + ((secondAggregate.count - 1) * secondAggregate.variance)) / ((otherAggregate.count + secondAggregate.count) - 2);
Bentipe
  • 103
  • 3

1 Answers1

1

There's no need to merge two sets and compute the variance, it's a time consuming task you can compute the variance for each of them separately, then update the total variance. Updating variance could be done by using an update formula. updating formula

$T_1,_m = \sum_{i=1}^{m} x_i \\$

$S_1,_m = \sum_{i=1}^{m} (x_i - \frac{1}{m}*T_1,_m)^2 \\$

The equation discussed at a pairwise algorithm for computing sample variances paper.


Update

Parallel algorithm

4.Pi.n
  • 156
  • 1
  • 8
  • 1
    thank you @m-zayan that pairwise algorithm is what in wikipedia is called the parallel algorithm, and is what I need, if you could add it to your answer so it can be more complete :) https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm – Bentipe Jun 12 '20 at 09:29
  • you are welcome, note the edit of the T, T is the sum over all elements in the set not (mean). I have added it, thanks. – 4.Pi.n Jun 12 '20 at 15:33