I want to add something to the previous answer, with which I completely agree.
It happens that I am working on implementing a statistical library in Java and I use as a reference point the computed values from R. A few days ago I studied algorithms for implementing mean and variance. And what I found is that the C code which computes the mean in R (mean in R calls an internal function which is written in C) uses a simple technique to compensate for loss caused by rounding. And there I found exactly what you searched for.
I will show a simplified code, since the original C code uses macros and unnecessary complicated stuff:
function mean(double[] x) {
double s = 0.;
double n = length(x);
for (int i = 0; i < n; i++) s += x[i];
s /= n;
double t = 0;
for (int i = 0; i < n; i++) t += x[i] - s;
s += t / n;
return s;
}
In the previous code the variable t
contains the sum of deviations about the mean. If that statement is interpreted strictly from a mathematical point of view, it should be 0. But when it comes to computation the same statement should be redefined as "t
contains the sum of deviations of the computed mean with finite precision".
The idea of the compensation is very intuitive when working on large values with small variation. In that case s might lose precision (by losing the last bits from the floating point representation) and computing t
gives a better chance of not doing so since the values of x[i]
and the computed s
are comparable.