1

For a set of 48 randomly generated points using runif(), I use the command quantile(x,0.25) to calculate the first quartile in R.

I don't understand what formula is used by R to calculate quartiles of ungrouped data points. The value I am getting as a result is not the median of the data from 1 to 23rd position.

MarianD
  • 1,493
  • 2
  • 8
  • 17
umm_what
  • 23
  • 4
  • Did you try ?quantile yet? There are a bunch of ways to calculate quantiles. – Dave Jan 21 '21 at 20:32
  • @Dave For ungrouped data what is the formula used in R or Excel? – umm_what Jan 21 '21 at 20:35
  • @Dave Yes, I tried. Using the command quantile(x,0.25) in R. – umm_what Jan 21 '21 at 20:37
  • Type ?quantile with the question mark to read the various ways that function can calculate quantiles. – Dave Jan 21 '21 at 21:13
  • @Dave Oh okay. I will try that. Thanks a lot. – umm_what Jan 21 '21 at 21:14
  • 1
    It sounds like you might have in mind the approach discussed at https://stats.stackexchange.com/questions/134229. That concerns a textbook that makes exactly the same mistake: (a) lower quartile of 48 values can be determined as the middle value of the lowest 25 values, not the lowest 23. (Maybe you are using the same bad text?) – whuber Jan 21 '21 at 22:02
  • @whuber Hi. Thanks for your comment. It's really helpful. And for 48 points, you mean median of first 24 data points right for first quartile? As (48+1)/2 is 24.5. So L is 24 and U is 25 in this case. – umm_what Jan 22 '21 at 07:52
  • Yes, if "24" and "25" are counting from the same origin. (Tukey counts inwards from the nearest extreme.) – whuber Jan 22 '21 at 14:09

1 Answers1

0

In R, there are various ways to find the lower quartile of $n = 48$ observations. However, you need to know it is not always obvious how to find sample percentiles, especially for small samples or where there are lots of ties.

The default method in R of finding quantiles is one of several available methods, which you can read about in R documentation, and choose the one you want to use.

Suppose you have $n = 48$ normal observations, rounded to two-places:

set.seed(121)
x = round(rnorm(48, 100, 15),2)
sort(x)
 [1]  73.77  77.96  78.89  82.62  82.92  83.51
 [7]  86.94  87.08  87.89  88.67  89.29  90.88
[13]  91.51  91.93  92.97  93.99  95.98  96.17
[19]  96.84  98.78  99.14  99.38  99.77 101.09
[25] 101.63 101.92 103.16 104.46 104.52 105.56
[31] 106.35 107.10 107.53 108.17 109.56 109.65
[37] 109.98 110.11 110.72 110.81 111.73 113.61
[43] 113.70 115.01 118.24 120.66 121.04 124.23

Just looking at the sorted list, you might guess that the lower quartile would be somewhere between 90.88 and 91.51. Halfway between is 91.245, but R chooses the number 91.3525, which is a little closer to 91.51.

Here are three ways in R to get the 25th percentile or lower quartile:

summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  73.77   91.35  101.36  100.36  109.73  124.23 

quantile(x)
      0%      25%      50%      75%     100% 
 73.7700  91.3525 101.3600 109.7325 124.2300 

quantile(x, .25)
    25% 
91.3525 

The type of method that is the default in R is type=7, but different types give slightly different answers, as shown below:

quantile(x, .25, type=5)
   25% 
 91.195 
quantile(x, .25, type=6)
    25% 
91.0375 
quantile(x, .25, type=7)
    25% 
91.3525   # Default
quantile(x, .25, type=8)
    25% 
91.1425 

All of the types have advocates who have different uses for quantiles in various fields of application. If you are a student, use whatever method of finding quantiles is standard in your class.

Quantiles are often used in practice with very large datasets, for which the types tend to give very nearly the same results. In practice, this 'quirkiness` in methods of finding quantiles is more of a curiosity than a difficulty.

BruceET
  • 47,896
  • 2
  • 28
  • 76