Quartiles of ungrouped data

Question

For a set of 48 randomly generated points using runif(), I use the command quantile(x,0.25) to calculate the first quartile in R.

I don't understand what formula is used by R to calculate quartiles of ungrouped data points. The value I am getting as a result is not the median of the data from 1 to 23^rd position.

Did you try ?quantile yet? There are a bunch of ways to calculate quantiles. — Dave, Jan 21 '21 at 20:32
@Dave For ungrouped data what is the formula used in R or Excel? — umm_what, Jan 21 '21 at 20:35
@Dave Yes, I tried. Using the command quantile(x,0.25) in R. — umm_what, Jan 21 '21 at 20:37
Type ?quantile with the question mark to read the various ways that function can calculate quantiles. — Dave, Jan 21 '21 at 21:13
It sounds like you might have in mind the approach discussed at https://stats.stackexchange.com/questions/134229. That concerns a textbook that makes exactly the same mistake: (a) lower quartile of 48 values can be determined as the middle value of the lowest 25 values, not the lowest 23. (Maybe you are using the same bad text?) — whuber, Jan 21 '21 at 22:02
@whuber Hi. Thanks for your comment. It's really helpful. And for 48 points, you mean median of first 24 data points right for first quartile? As (48+1)/2 is 24.5. So L is 24 and U is 25 in this case. — umm_what, Jan 22 '21 at 07:52
Yes, if "24" and "25" are counting from the same origin. (Tukey counts inwards from the nearest extreme.) — whuber, Jan 22 '21 at 14:09

score 0 · Accepted Answer · answered Jan 21 '21 at 21:59

In R, there are various ways to find the lower quartile of $n = 48$ observations. However, you need to know it is not always obvious how to find sample percentiles, especially for small samples or where there are lots of ties.

The default method in R of finding quantiles is one of several available methods, which you can read about in R documentation, and choose the one you want to use.

Suppose you have $n = 48$ normal observations, rounded to two-places:

set.seed(121)
x = round(rnorm(48, 100, 15),2)
sort(x)
 [1]  73.77  77.96  78.89  82.62  82.92  83.51
 [7]  86.94  87.08  87.89  88.67  89.29  90.88
[13]  91.51  91.93  92.97  93.99  95.98  96.17
[19]  96.84  98.78  99.14  99.38  99.77 101.09
[25] 101.63 101.92 103.16 104.46 104.52 105.56
[31] 106.35 107.10 107.53 108.17 109.56 109.65
[37] 109.98 110.11 110.72 110.81 111.73 113.61
[43] 113.70 115.01 118.24 120.66 121.04 124.23

Just looking at the sorted list, you might guess that the lower quartile would be somewhere between 90.88 and 91.51. Halfway between is 91.245, but R chooses the number 91.3525, which is a little closer to 91.51.

Here are three ways in R to get the 25th percentile or lower quartile:

summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  73.77   91.35  101.36  100.36  109.73  124.23 

quantile(x)
      0%      25%      50%      75%     100% 
 73.7700  91.3525 101.3600 109.7325 124.2300 

quantile(x, .25)
    25% 
91.3525

The type of method that is the default in R is type=7, but different types give slightly different answers, as shown below:

quantile(x, .25, type=5)
   25% 
 91.195 
quantile(x, .25, type=6)
    25% 
91.0375 
quantile(x, .25, type=7)
    25% 
91.3525   # Default
quantile(x, .25, type=8)
    25% 
91.1425

All of the types have advocates who have different uses for quantiles in various fields of application. If you are a student, use whatever method of finding quantiles is standard in your class.

Quantiles are often used in practice with very large datasets, for which the types tend to give very nearly the same results. In practice, this 'quirkiness` in methods of finding quantiles is more of a curiosity than a difficulty.

Quartiles of ungrouped data

1 Answers1