10

I have the vector

x <- c(1,2,3,4,5,5,5,6,6,6,6,
       7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
       7,7,7,7,7,7,7,7,8,8,8,8,9,9,9,10)

(my actual vector has a length of >10,000) and I would like to find the intervals where the 90% of the density lies. Is quantile(x, probs=c(0.05,0.95), type=5) the most appropriate or is there any other way?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
ECII
  • 1,791
  • 2
  • 17
  • 25
  • Your question is a little vague about "the intervals where..." - there could be multiple intervals. Are you interested in solely the inner 90%, i.e. symmetrically trimming on each side? After all, from the minimum to 90%ile, 90% of the data is captured, similarly for 10%ile to the max value. – Iterator Nov 16 '11 at 12:53
  • Are you looking for a shortest interval, a symmetric interval (equal probability out each end), or something else? – Glen_b Sep 03 '16 at 03:14

3 Answers3

19

As pointed out above, there are many different ways to define an interval that includes 90% of the density. One that hasn't been pointed out yet is the highest [posterior] density interval (wikipedia), which is defined as "the shortest interval for which the difference in the empirical cumulative density function values of the endpoints is the nominal probability".

library(coda)
HPDinterval(as.mcmc(x), prob=0.9)
Ben Bolker
  • 34,308
  • 2
  • 93
  • 126
6

Your way seems sensible, especially with the discrete data in the example,

quantile(x,probs=c(0.05,0.95), type=5)
 5% 95% 
2.8 9.0

but another way would be to use a computed density kernel:

dx <- density(x)
dn <- cumsum(dx$y)/sum(dx$y)
li <- which(dn>=0.05)[1]
ui <- which(dn>=0.95)[1]
dx$x[c(li,ui)]
[1] 2.787912 9.163246
James
  • 2,106
  • 18
  • 21
3

It certainly seems like the most straightforward approach. The function is quite fast. I use it all the time on samples that are hundreds of times larger that the one you are using, and the stability of the estimates should be good at your sample size.

There are functions in other packages that provide more complete sets of descriptive statistics. The one I use is Hmisc::describe, but there are several other packages with describe functions.

DWin
  • 7,005
  • 17
  • 32