Find Q1 and Q3 from median and IQR

Question

A study gives the following:

$n = 67$

mean = 73

sd = 68

median = 55

IQR = 66

Is it possible from this information to get the actual Q1 and Q3 values? I used the $n$, mean & sd to get 95% CI. Should that be roughly similar?

You would have to make an assumption about the distribution. If the distribution is symmetric, for example, you can just divide the IQR by 2 and add and subtract that value from the median to get Q1 and Q3. If the distribution is skewed, then the distance from Q1 to Q2 might not be the same as the distance from Q2 to Q3. It also depends on if you want to calculate what the observed Q1 and Q3 values of the data would be ("they didn't report it, so I'll calculate it") or if you want to say something about the population. What is your goal? — Dave, Feb 11 '20 at 14:56
Thank you. I'm assembling a range, and two other studies report a median and specific values of Q1 and Q3 (but no mean and SD), so I'm trying to find a way to be consistent across. It's pretty skewed data, though, so I don't think there's an mathematical work-around. Fortunately, I know the author :) — chidi, Feb 11 '20 at 16:47
This is one reason why I like to give IQR as two numbers (first and third quartiles) rather than a single number (and similarly for the full range). I also insist on this when I review articles. — Peter Flom, Feb 11 '20 at 18:25
To be fair, the use of range and generation of a single number is correct. The vast majority of introductory statistical texts use the range as a difference between two numbers. Take the IQR for example, the Q3 minus Q1 is the IQR; similarly, the more general "range" is max minus min. If you want min and max or specific quantiles, call them as they are, but don't call them a "range" as this is more traditionally defined as a scalar. This is further evidenced by the fact that range and IQR are meant as measures of variability (like SD). — LSC, Feb 11 '20 at 21:16
If you do ask the author, let me know if my answer came close to the actual quartiles! — Matt F., Feb 12 '20 at 17:01

Peter Flom · Answer 1 · 2020-02-11T20:52:01.003

1

As @Dave noted in a comment, you would have to make some assumptions about the distribution. Given the mean and the median being so different, it's likely that there is substantial skew - and you confirm this in a comment.

Various assumptions might be reasonable.

With median = 55 and IQR = 66 (and no other info or assumptions), then, with a symmetric distribution, you would have 22 and 88 for the quartiles. But you could have anything from -10 and 56 to 54 and 120. But you have additional info: The mean and sd - these will limit the possibilities. And you probably also can figure out some things from the nature of the variable (e.g. is it always positive?) and try various distributions.

edited Feb 11 '20 at 20:52

answered Feb 11 '20 at 18:30

Peter Flom

94,055
35
143
276

Q1 could equal 55 and Q3 would then equal 121. It's not clear that Q1 could get as low as -10: with a little effort I found solutions only down to 18. (The problem is that very low values of Q1 would require some very large data to make the mean 73, but then the SD would be too large.) Thus, it is conceivable even these limited statistics--with no distributional assumptions at all--constrain the possible values of the quartiles. – whuber Feb 11 '20 at 20:37
1

Sorry, I meant that you could limit the Q1 and Q3 to those ranges only using the info on median and IQR. Adding the info on the mean and sd would limit it more. – Peter Flom Feb 11 '20 at 20:51
Would Chebyshev's inequality be able to give us a bound? – Dave Feb 11 '20 at 21:09
@Dave Because the SD is greater than the IQR, Chebyshev isn't going to provide any usable information. If the SD were smaller then it would indeed constrain the IQR. – whuber Feb 11 '20 at 21:33

score 1 · Answer 2 · answered Feb 16 '20 at 16:20

You should have given some context, what (real-life) variable $x$ do your data represent? Some questions you probably know answers for:

What is the possible range for $x$? That is, is $x$ nonnegative? or a count? ...
Can we suppose independence?

Nevertheless, some observations:

the mean is larger than the median, and a 95% confidence interval for the mean based on normal distributions give about $( 56.4, 89.6)$, the observed median is just outside. So the data casts doubt on symmetry, and points to a right-skewed distribution.
The observed mean and standard deviation are close, pointing to an exponential (or more generally gamma) distribution.
One can also get a rather close fit with a lognormal distribution, I get that $\mu=4, \sigma=0.778$ is close. One could also try normal or skew-normal distributions. As soon as you decide to try some distributional family as a model, you can use the given descriptive statistics to find moment-type estimators.
and given those estimators, it is now easy to calculate the quartiles.

Can we say something more? Maybe trying to compare some such models? I doubt normal or skew-normal models can give a good fit, let us try the gamma and lognormal models. We can simulate data from such models, and try abc-methods (approximate bayes computations) to compare them. Some details here: How to do estimation, when only summary statistics are available?

Matt F. · Answer 3 · 2020-02-11T21:57:34.083

One distribution that fits these parameters is a mixture of:

50% a lognormal distribution with $\mu=\ln 55,\ \sigma=.89$
25% a point mass at $31.38$
25% a point mass at $97.38$

So the quartiles could be at those point masses, though there are many other possibilities also.

I found this by solving some equations; here is the explanation:

The median of the lognormal is $55$, so the median of the mixture is also $55$.
The mean of the lognormal is $81.58$, so the mean of the mixture is $$(25\%)31.38+(50\%)81.58+(25\%)97.38=73.$$
The point masses are at roughly the $26^{th}$ and $74^{th}$ percentiles of the lognormal, so they are at the $13^{th}-38^{th}$ and $62^{nd}-87^{th}$ percentiles of the mixture. In particular, they are the quartiles of the mixture and the IQR is $66$.
The second moment of the lognormal is $14643$, so the variance of the mixture is $$(25\%)(31.38-73)^2+(50\%)(14643 - 2(73)(81.58)+73^2)+(25\%)(97.38-73)^2=68^2.$$

The final mixture is reasonably easy to understand, and you could tinker with it to get smaller point masses, an $n=67$ dataset with the same properties, or other possibilities.

Find Q1 and Q3 from median and IQR

3 Answers3

Linked