Find probability density intervals

Question

I have the vector

x <- c(1,2,3,4,5,5,5,6,6,6,6,
       7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
       7,7,7,7,7,7,7,7,8,8,8,8,9,9,9,10)

(my actual vector has a length of >10,000) and I would like to find the intervals where the 90% of the density lies. Is quantile(x, probs=c(0.05,0.95), type=5) the most appropriate or is there any other way?

Your question is a little vague about "the intervals where..." - there could be multiple intervals. Are you interested in solely the inner 90%, i.e. symmetrically trimming on each side? After all, from the minimum to 90%ile, 90% of the data is captured, similarly for 10%ile to the max value. — Iterator, Nov 16 '11 at 12:53
Are you looking for a shortest interval, a symmetric interval (equal probability out each end), or something else? — Glen_b, Sep 03 '16 at 03:14

score 19 · Accepted Answer · answered Nov 16 '11 at 13:56

As pointed out above, there are many different ways to define an interval that includes 90% of the density. One that hasn't been pointed out yet is the highest [posterior] density interval (wikipedia), which is defined as "the shortest interval for which the difference in the empirical cumulative density function values of the endpoints is the nominal probability".

library(coda)
HPDinterval(as.mcmc(x), prob=0.9)

score 6 · Answer 2 · answered Nov 16 '11 at 12:41

Your way seems sensible, especially with the discrete data in the example,

quantile(x,probs=c(0.05,0.95), type=5)
 5% 95% 
2.8 9.0

but another way would be to use a computed density kernel:

dx <- density(x)
dn <- cumsum(dx$y)/sum(dx$y)
li <- which(dn>=0.05)[1]
ui <- which(dn>=0.95)[1]
dx$x[c(li,ui)]
[1] 2.787912 9.163246

score 3 · Answer 3 · answered Nov 16 '11 at 12:40

It certainly seems like the most straightforward approach. The function is quite fast. I use it all the time on samples that are hundreds of times larger that the one you are using, and the stability of the estimates should be good at your sample size.

There are functions in other packages that provide more complete sets of descriptive statistics. The one I use is Hmisc::describe, but there are several other packages with describe functions.

Find probability density intervals

3 Answers3

Linked