Computing percentile rank in R

Question

How can I add new variable into data frame which will be percentile rank of one of the variables? I can do this in Excel easily, but I really want to do that in R.

Thanks

chl · Accepted Answer · 2011-06-15T11:13:21.743

33

Given a vector of raw data values, a simple function might look like

perc.rank <- function(x, xo)  length(x[x <= xo])/length(x)*100

where x0 is the value for which we want the percentile rank, given the vector x, as suggested on R-bloggers.

However, it might easily be vectorized as

perc.rank <- function(x) trunc(rank(x))/length(x)

which has the advantage of not having to pass each value. So, here is an example of use:

my.df <- data.frame(x=rnorm(200))
my.df <- within(my.df, xr <- perc.rank(x))

edited Jun 15 '11 at 11:13

answered Jun 15 '11 at 10:30

chl

50,972
18
205
364

3

1. Your function does not mimic Excel's `percentrank`-function, which is good (+1) since the latter gives "strange" results (see my [comparison](https://gist.github.com/1026879)). 2. I wouldn't name the data frame `df`, because `df` is an R function (the density of the F distribution, see `?df`). – Bernd Weiss Jun 15 '11 at 11:04
1

@Bernd Thanks. (1) There are some built-in functions for computing PR in various psychometrics packages. I think I grabbed this one from the `CTT` package a while ago. I didn't check against Excel because I don't have/use it. About (2) I seem to always forget about this! Let's go with `my.*` (Perl way) :-) – chl Jun 15 '11 at 11:21
@chl why is the `trunc` required? It seems rank will always return an integer anyway. – Tyler Rinker May 10 '18 at 18:38
1

@Tyler Nope. In case there are ties, `rank()` defaults to taking the average of the tied values (cf. `ties.method = c("average",...)`). – chl May 11 '18 at 13:15
Beware that NA values should be removed! This can be done by adding `x = x[!is.na(x)]` – Antoine Mar 21 '21 at 16:55

Nick Sabbe · Answer 2 · 2011-06-15T11:50:44.800

9

If your original data.frame is called dfr and the variable of interest is called myvar, you can use dfr$myrank<-rank(dfr$myvar) for normal ranks, or dfr$myrank<-rank(dfr$myvar)/length(myvar) for percentile ranks.

Oh well. If you really want it the Excel way (may not be the simplest solution, but I had some fun using new (to me) functions and avoiding loops):

percentilerank<-function(x){
  rx<-rle(sort(x))
  smaller<-cumsum(c(0, rx$lengths))[seq(length(rx$lengths))]
  larger<-rev(cumsum(c(0, rev(rx$lengths))))[-1]
  rxpr<-smaller/(smaller+larger)
  rxpr[match(x, rx$values)]
}

so now you can use dfr$myrank<-percentilerank(dfr$myvar)

HTH.

edited Jun 15 '11 at 11:50

answered Jun 15 '11 at 10:06

Nick Sabbe

12,119
2
35
43

1 - (rank/size) gives you same as excel percentilerank – user333 Jun 15 '11 at 11:24
I got this from [office.microsoft.com](http://office.microsoft.com/en-us/excel-help/percentrank-HP005209212.aspx) – Nick Sabbe Jun 15 '11 at 11:51
An anonymous (attempted) editor tried to add the following comment: "Nice function but sometimes, unfortunately, the RLE may return vector of `length < length(dfr$myvar)`". – gung - Reinstate Monica Aug 26 '13 at 16:58
Can you explain or link to the theory of this method? – mavavilj May 18 '21 at 18:47

Farshad · Answer 3 · 2016-01-13T01:09:37.740

A problem with the presented answer is that it will not work properly, when you have NAs.

In this case, another possibility (inspired by the function from chl♦) is:

perc.rank <- function(x) trunc(rank(x,na.last = NA))/sum(!is.na(x))
quant <- function (x, p.ile) {
      x[which.min(x = abs(perc.rank(x-(p.ile/100))))]
}

Here, x is the vector of values, and p.ile is the percentile by rank. 2.5 percentile by rank of (arbitrary) coef.mat may be calculated by:

quant(coef.mat[,3], 2.5)  
[1] 0.00025

or as a single function:

quant <- function (x, p.ile) {
   perc.rank <- trunc(rank(x,na.last = NA))/sum(!is.na(x))
   x = na.omit(x)
   x[which.min(x = abs(perc.rank(x-(p.ile/100))))]
}

Computing percentile rank in R

3 Answers3

Linked