0

I am looking for good (fast, memory efficient, ideally one-pass) algorithms to compute the sample mean and variance.

I have found a technical report by Chan, Golub, and LeVeque from 1983 with interesting algorithms. Are there more modern improvements?

EDIT: Added the crucial words "and variance"

  • My canonical reference would be the [*Numerical Recipes*](http://numerical.recipes/) (in C, or whatever). I wouldn't expect CV to be very useful for actual statistical algorithms. If you don't get good answers here, you may want to consider flagging your question and asking a moderator to migrate it to [Programmers.SE](http://programmers.stackexchange.com/). – Stephan Kolassa Apr 18 '16 at 16:39
  • 1
    Algorithms of this sort--for many more statistics than the mean--can be found by [searching our site](http://stats.stackexchange.com/search?q=online+mean+algorithm). (There are hundreds of hits. I'm not sure why @StephanKolassa thinks we are not a good site to ask such questions.) The duplicate concerns a more difficult version of your question, but it is answered directly in one of the replies. – whuber Apr 18 '16 at 16:53
  • 1
    The Chan, Golub, and LeVeque paper addresses numerical computation of sample variance, and is not really focused on sample mean. Which are you really interested in? You may consider http://scicomp.stackexchange.com/ as a possible site for your (to be clarified) question. – Mark L. Stone Apr 18 '16 at 16:59
  • In light of the edit, I have redirected this thread to a different duplicate (that addresses both the mean and the variance). Also see http://stats.stackexchange.com/questions/88837/single-pass-algorithm-for-kurtosis, as well as other hits in a site search for [online variance algorithm.](http://stats.stackexchange.com/search?q=online+variance+algorithm) – whuber Apr 18 '16 at 18:10
  • @whuber: Being "online" is not a requirement (in fact, I'd be happy not to deal with factors like (n-1)/n and 1/n until the final computation of the variance). –  Apr 19 '16 at 10:20
  • I'm curious about the distinction you are making. If a "fast, memory-efficient, one-pass" algorithm isn't "online," then how does it differ? The factors of course are irrelevant because computing them requires only $O(1)$ storage and $O(n)$ time. – whuber Apr 19 '16 at 12:29

0 Answers0