1

Possible Duplicate:
What is a good algorithm for estimating the median of a huge read-once data set?

Imagine you have a large, multivariate dataset that resides on disk.

Are there any known methods to efficiently compute median with a minimum number of passes through the data ?

I've found a candidate for variance/stddev in the name of Welfod/Knutt algorithm, but what about median ?

Thanks

oDDsKooL
  • 1,202
  • 2
  • 14
  • 32
  • 2
    Have you seen this? http://stats.stackexchange.com/questions/346/what-is-a-good-algorithm-for-estimating-the-median-of-a-huge-read-once-data-set? – ocram Jan 04 '12 at 08:46
  • 1
    There are several other candidates that should be of interest as well, such as, [Is it possible to accumulate a set of statistics that describes a large number of samples such that I can then produce a boxplot?](http://stats.stackexchange.com/q/3372/1036) and potentially, [Algorithms to compute the running median?](http://stats.stackexchange.com/q/134/1036). – Andy W Jan 04 '12 at 13:05
  • @ocram : yes, very good entry point for my question, thanks ! – oDDsKooL Jan 05 '12 at 10:50
  • @Andy W, thanks too, those pointers are very good indeed ! Too bad the system didn't manage to find'em when I wrote the question... – oDDsKooL Jan 05 '12 at 10:52

0 Answers0