5

There exists 3 sets of numbers. I have the 99th percentile (p99) of each set and the cardinality of the set, but not the values in the set themselves.

  • p99: 540, cardinality: 215
  • p99: 288, cardinality: 4
  • p99: 432, cardinality: 78

What is the most accurate way to estimate the combined p99 of those 3 sets without having access to the original data?

Right now I'm debating the follow 2 options:

  • Simply use the p99 value of the set with the highest cardinality.
  • Use a weighted average of the p99 values

Are there better, more accurate options?

SGr
  • 153
  • 2
  • Weighted average. – Ethan Bolker Apr 08 '16 at 19:08
  • Weighted average is probably preferable in this case. But do note, no matter what you do, you shouldn't expect it to be "accurate." As George Box famously stated, ["Essentially, all models are wrong, but some are useful"](https://en.wikipedia.org/wiki/All_models_are_wrong). – Clarinetist Apr 08 '16 at 19:16
  • 1
    How did you conclude anything about what the 99th percentile is based on a sample of only four? That could be done if, for example, you assumed the samples are from a normally distributed population, but you haven't mentioned anything about that. $\qquad$ – Michael Hardy Apr 08 '16 at 19:25
  • In that sample, it would be the highest value since there's less than 100 values. – SGr Apr 08 '16 at 19:30

0 Answers0