1

I have some samples of a stable real-world process. Its is polymodal, and does not cleanly fit any of the "textbook" analytic distributions. I need to make very accurate estimates of the maximum value for a number of samples that is much larger than what I have.

Two things that I thought of and have shown not to work:

  1. Compute iqr, multiply it by 4, and add it to the median. This is a robust-statistic analog of offsetting a control limit so to get a Cpk of 1.33 using mean and standard deviation for a normal distribution. This sets limits way too wide.
  2. Choose the maximum and minimum of the current distribution. This under-estimates the extremal value.

UPDATE:

I tried and had semi-consistent results when I fit the data to a Gaussian Mixture Model. I used the AIC indicated number of parameters. In inverse CDF domain, probit, the Gaussian is a line, so I think this is just the equivalent of using 3 piecewise smooth lines (with fillets) to interpolate in the domain. However, it still seems not strong enough. It seems un-rigorous. I could make a mixture of t-distributions, or other fits, if I believed that they made a good basis for the underlying systems. The wider tails might be useful for real-world data.

EngrStudent
  • 8,232
  • 2
  • 29
  • 82
  • The "extrapolation" in the title says it all: *you are making up the answer arbitrarily* when you estimate extreme values from smaller datasets. All such estimates rely on assumptions about the shape of the tail; there is no getting around that. Methods like those you have discussed are just ways of hiding this fact, which explains why they do not work. Your only hope is to learn something about the tails either from theory or from other related datasets and incorporate that knowledge in your modeling. – whuber Mar 28 '13 at 21:07

0 Answers0