10

is there a way to fit a specified distribution if you are only given a few quantiles?

For example, if I told you I have a gamma distributed data set, and the empirical 20%, 30%, 50% and 90%-quantiles are, respectively:

      20%       30%       50%       90% 
0.3936833 0.4890963 0.6751703 1.3404074 

How would I go and estimate the parameters? Are there multiple ways to do that, or is there already a specific procedure?

more edit: I don't specifically ask for the gamma distribution, this was just an example because I worry I can't explain my question appropriately. My task is that I have some (2-4) given quantiles, and want to estimate the (1-3) parameters of a few distributions as "close" as possible. Sometimes there's an (or infinite) exact solution(s), sometimes not, right?

Alexander Engelhardt
  • 4,161
  • 3
  • 21
  • 25
  • 1
    I voted to close this as a duplicate of http://stats.stackexchange.com/questions/6022, but then it occurred to me that there are possible interpretations of this question that make it different in an interesting way. As a purely mathematical question--if someone teasingly gives you a few quantiles of a mathematical distribution--this is without statistical interest and belongs on the math site. But if these quantiles are *measured* in a dataset, then generally they will not exactly correspond to the quantiles of any gamma distribution and we need to find the "best" fit in some sense. – whuber Jun 11 '12 at 12:19
  • 1
    So, after that long introductory comment, which situation are you in, Alexx? Should we send your question over to the math people for a theoretical answer, or are these quantiles derived from data? If the latter, then could you help us understand what a "good" (or a "best") solution would look like? E.g., should the fitted distribution match some of the quantiles better than some of the others when a perfect fit is not possible? – whuber Jun 11 '12 at 12:20
  • But actually the second answer (by @mpiktas) in the link you posted estimates the distribution even if your quantiles are not exact (derived from the data). – Dmitry Laptev Jun 11 '12 at 12:35
  • This definitely a different question as it is specific to the gamma rather than the lognormal. Using the data to fit a model based on quantiles or order statistics seems to me to be a reasonable thing to do even though not efficient. if the situation is like I mentioned with the Gupta reference where all the order statistics are available you can get reasonale parameter estimates. If it is just solving two equations in two unknowns based on a couple of quantiles then the estimates will be very poor as pointed out by Huber and others in the other stackexchange question cited here by whuber. – Michael R. Chernick Jun 11 '12 at 12:37
  • @John The solution by mpiktas uses quadratic loss. I am suggesting that this loss function might not be appropriate generally. Michael, the question of fitting a distribution given percentiles is *essentially* the same regardless of its formula; e.g., the method given by mpiktas in the related question will work as well for the gamma as for the lognormal, *mutatis mutandis.* – whuber Jun 11 '12 at 12:41
  • whuber, thanks for your request for clarification. I forgot to add that I meant the empirical quantiles of a data set. The question has been edited. A "good" fit would just be one with as little discrepancy as possible, but I don't have a specific loss function in mind, so maybe we should just use the "default" squared loss or something? I do not have specifics in mind – Alexander Engelhardt Jun 11 '12 at 13:01
  • Using the quadratic loss function you get a good fit. The estimators I obtained using this are $(\hat\alpha,\hat\beta)=(3.097, 0.244)$. –  Jun 11 '12 at 13:14
  • @Procrastinator Quadratic loss is usually not a good choice. Consider the error structure of the percentiles (under iid sampling): for gammas, there is much greater variation at the upper tail than in the middle. Moreover, *errors are correlated.* Thus least squares, although it *works,* may be far from optimal when relatively extreme percentiles are involved. – whuber Jun 11 '12 at 13:22
  • @whuber I agree, I just mentioned the empirical fit I observed not about optimality. What loss function would you recommend to get a better result? Do you know of any theoretical result about the relationship between the choice of the loss function and the rate of convergence? –  Jun 11 '12 at 13:25
  • Rate of convergence of what, @Proc? I think the key issue is getting an appropriate fit. The percentiles themselves have well-known distributions conditional on the underlying distribution itself, so one might approach this with a crude initial estimate of parameters (e.g., least squares!) followed by a generalized least squares re-estimate of the parameters. When there are more percentiles than parameters to estimate, and those percentiles cover a wide range (e.g., they're not 90-91-92-93), it seems likely the LS and GLS solutions will be easy to obtain and numerically stable. – whuber Jun 11 '12 at 13:30
  • @whuber I meant rate of convergence of the estimators (if possible under a certain choice of the loss function). What do you mean by 'optimal'? –  Jun 11 '12 at 13:33
  • Oh this wonderful smell of oil and metal... a new wheel being reinvented. Econometricians have proposed the generalized method of moments three decades ago, see http://www.citeulike.org/user/ctacmo/article/1155588 (attribution to the work of Ferguson in the 1950s is made in the paper, I believe). This methodology must be a part of an asymptotics class if math statisticians were not so snobby about the possibility of wonderful methods emerge in other disciplines; econometricians teach their students with GMM as an all-encompassing principle rather than the likelihood. – StasK Jun 11 '12 at 13:40
  • @Proc The question of optimality is one I earlier addressed to the OP, who demurred, leaving it up to us to consider what makes sense in general. Originally, I imagined a situation in which the purpose of the fitting might be to make an estimate, in which case "optimal" can take on its usual meanings for a statistical estimation problem. Thus, for instance, an optimal fit when the ultimate estimator is a 90th percentile would heavily weight the upper percentiles in the data, whereas an optimal fit when the mean is the estimand would likely weight the data very differently. – whuber Jun 11 '12 at 13:40
  • 1
    @Stas What does this problem have to do with GMM? I don't see *any* moments in evidence! – whuber Jun 11 '12 at 13:41
  • 1
    "Moments" is a bad name they got stuck with, admittedly. The method in fact works with estimating equations, and I hope you do see some in this example, @whuber. To rephrase, the GMM theory covers anything that can be done with the quadratic loss for estimating equations, including higher order asymptotics and weird dependencies between observations or equations. – StasK Jun 11 '12 at 14:16
  • @StasK - Gee, I had exposure to GMM in my two semester math-stats class taught by Bickel many years ago :) And one can't claim that that particular "grandson" of Neyman through Lehmann isn't a math statistician... – jbowman Jun 11 '12 at 14:21
  • Thank you for the clarification, @Stas. I was not aware of the generality of GMM. – whuber Jun 11 '12 at 15:01

1 Answers1

3

i don't know what was in the other post but I have a response. One can look at the order statistics which represent specific quantiles of the distribution namely, the $k$'th order statistic, $X_{(k)}$, is an estimate of the $100 \cdot k/n$'th quantile of the distribution. There is a famous paper in Technometrics 1960 by Shanti Gupta that shows how to estimate the shape parameter of a gamma distribution using the order statistics. See this link: http://www.jstor.org/discover/10.2307/1266548

Macro
  • 40,561
  • 8
  • 143
  • 148
Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
  • I TeXed one part of your answer (leaving the content identical) but I'm a little confused and think there may be a typo or something. Re: "One can look at the order statistics which represent specific quantiles of the distribution.....". Do you mean quantiles of the empirical distribution? Also, the $k$'th order statistic usually refers to the $k$'th smallest value, not the $k/n$'th quantile of the empirical distribution, right? Can you clarify (sorry if I'm being dense)? – Macro Jun 11 '12 at 13:11
  • If n is the sample size the kth order statistic represents an estimate of the 100 k/n percentile of the distribution being sampled. – Michael R. Chernick Jun 11 '12 at 13:15
  • @MichaelChernick, I've slightly edited your answer to make that clear - hopefully this looks ok. – Macro Jun 11 '12 at 13:21