I have been working through Doing Bayesian Data Analysis by John K. Kruschke. In this book he has created a function for calculating the minimum and maximum values of the Highest Density Interval (HDI) of a mcmc chain, for use in making decisions about whether a the true parameter value of a given effect is non-zero. I deconstructed this function and am confused about the logic of a few steps.
Instead of running an entire JAGS model I have created a vector that will serve in the place of a true mcmc chain
v <- rnorm(50000, 0.6, .1)
The next steps I am clear about
vSort <- sort(v) # sort from lowest estimate to highest
ciInd <- .95*length(vSort) # calculate 95% of the total number of estimates
int5 <- length(vSort) - ciInd # 5% of the total number of estimates
Now these are the steps I do not understand. We generate a vector of values as long as each side of the HDI. The 1st value generated by the loop is the value at the 47501st position in the vector of sorted estimates minus the value at the 1st position. The 2nd loop generates a value that is the value at 47502nd position minus the value at the 2nd position etc.
vecHDI <- sapply(1:int5, function (i) vSort[i + ciInd] - vSort[i])
To derive the minimum value of the HDI we find the position of the lowest value in the vector we just created (which represents the lowest difference between estimates 47500 positions apart) and substitute that into the original sorted vector
(HDImin <- vSort[which.min(vecHDI)])
To find the maximum HDI value derived we simply add 47500 to the position of the minimum value.
(HDImax <- vSort[which.min(vecHDI) + ciInd])
What I am unclear about is why we go through the process of generating the vector of differences to derive the reference for the position of the miminimum HDI? Why do we not just use the 2501st position in the sorted vector of parameter estimates (i.e. vSort[int5 + 1]
)?