I have some samples of a stable real-world process. Its is polymodal, and does not cleanly fit any of the "textbook" analytic distributions. I need to make very accurate estimates of the maximum value for a number of samples that is much larger than what I have.
Two things that I thought of and have shown not to work:
- Compute iqr, multiply it by 4, and add it to the median. This is a robust-statistic analog of offsetting a control limit so to get a Cpk of 1.33 using mean and standard deviation for a normal distribution. This sets limits way too wide.
- Choose the maximum and minimum of the current distribution. This under-estimates the extremal value.
UPDATE:
I tried and had semi-consistent results when I fit the data to a Gaussian Mixture Model. I used the AIC indicated number of parameters. In inverse CDF domain, probit, the Gaussian is a line, so I think this is just the equivalent of using 3 piecewise smooth lines (with fillets) to interpolate in the domain. However, it still seems not strong enough. It seems un-rigorous. I could make a mixture of t-distributions, or other fits, if I believed that they made a good basis for the underlying systems. The wider tails might be useful for real-world data.