It is likely that I am incorrect terminology here, but I am trying to compute a "mean" of an interval censored random variable.
Here is an example where:
1. a random sample from the standard normal distribution is discretized to create an interval censored random sample;
2. the marginal probability distribution of the sample is computed;
3. the midpoints of the intervals are computed;
4. the weighted mean of the interval midpoints is computed.
A simple simulation shows that this leads to a biased estimate of the mean of the underlying variable, and obviously that bias depends on the:
1. number of intervals into which the variable is discretized, and;
2. the sample size.
replIntMean = replicate(100, {
X = rnorm(1000)
# discretize the variable
XD = cut(X, quantile(X, probs = seq(0, 1, 0.1)), include.lowest = TRUE)
# compute the marginal probability table
probX = prop.table(table(XD))
# compute the upper and lower limit of the censoring intervals
liUL = regmatches(levels(XD),
gregexpr("([\\+-]*[0-9]+\\.[0-9]+)", levels(XD)))
# computed the weighted mean of the variable
sum(probX * sapply(liUL, function(x) mean(as.numeric(x))))
},
simplify = "array")
plot(replIntMean, type = "l")
abline(h = 0, col = "blue")
abline(h = mean(replIntMean), col = "red")
I am wondering if there is any guidance in the literature on the right way to compute measures of central tendency of interval coded variables, and a discussion of their relative properties and interpreations.
Please note that the application is not survival analysis as most of the references tend to be to that, and also I am aware that the simplest measure of central tendency here is the modal class.