Understanding The Algorithms Behind Quantile() in R

Question

So I was trying to compute the quantiles from a dataset in R:

c(3,5,7,8,12,13,14,18,21)

using the quantile() function, and I realised that it was returning unexpected quartile values for Q1 and Q3 (7 and 14, respectively, should be 6 and 16) on it's own. So I went into the documentation and saw you can assign a type to the function 1-9 that correspond to algorithms. By process of elimination type 6 yields the expected quantiles for that specific dataset, but due to my weak maths I don't actually understand the algorithms being used, and why type 6 specifically worked on my data type (more on those). Does a good explanation for when to use each type exist for someone really new to stats?

Related: [Relation between Quintiles and the Arithmetic Mean](http://stats.stackexchange.com/q/178578/7290). — gung - Reinstate Monica, Jan 09 '16 at 20:35
To be really specific I'm unclear how type 7 calculates (and arrives at a presumably wrong answer for this specific case) differently from type 6 (that arrives at the presumably right answer in this case), and if there's any way to intuitively use the right algorithm. /e type 7 is the R default. — CKM, Jan 09 '16 at 20:43
Some useful threads can be found by [searching our site](http://stats.stackexchange.com/search?q=quantile+algorithm+[r]). The one at http://stats.stackexchange.com/questions/13399 seems like it might be particularly helpful. — whuber, Jan 09 '16 at 23:40
None of the nine definitions are "wrong" or "right" and type 6 *doesn't* actually give "expected quantiles"; it gives the value which has on average the expected percentile rank. Indeed, if anything, type 7 comes closer to the expected quantile in the normal case. What definition of sample quantile do you actually want? Are you trying to match some textbook definition? You may get some benefit from the Hyndman and Fan paper (see the R help) and some of its references. — Glen_b, Jan 10 '16 at 02:49
I think the problem is in the way we're thinking about the discrepancy. It's an online lecture combining intro stats and R, the instructor has explained that for a small dataset, you can find quantiles by simply dividing the data relative to the mean, and finding the mean of those points. So if mean = 12, then of c(3,5,7,8) you can find Q1 by (5+7)/2 = 6 and so forth. I assumed this to be right just based on the credentials of the instructor, but I can see that this isn't a good way to tackle it! — CKM, Jan 10 '16 at 07:17

score 2 · Accepted Answer · edited Oct 05 '20 at 10:14

Why should the answers be $6$ and $16$? The $p^\text{th}$ quantile $q_p$ is the smallest value for which the proportion of the sample lesser than or equal to $q_p$ is at least $p$, and the proportion greater than or equal to $q_p$ is at most $1 - p$. If we take $p = 1/4$ and look at your sample $\{3, 5, 7, 8, 12, 13, 14, 18, 21 \}$, $7$ matches this definition but $6$ doesn't. The same is true for the $p = 3/4$ quantile.

Understanding The Algorithms Behind Quantile() in R

1 Answers1