Estimate distribution from truncated data

Question

My data looks like this

n <- 1000
x <- runif(n, min=1, max=100)
y <- rnorm(n, mean=2+5*x, sd=90)
treshold <- 3
idx <- which((y/x)>treshold)

dat_trunc <- data.frame(x=x[idx],y=y[idx])
dat_full <- data.frame(x=x,y=y)

clr <- rep("lightgray", n)
clr[idx] <- "black"
plot(x,y, col=clr)

The gray data points are truncated.

Now I wish to recover the parameters of the distribution using the incomplete data.

Any hints?

EDIT: simplified the problem

Do you know the truncation threshold and want to recover the mean and standard deviation? Or do you need to estimate the threshold as well? — Elizabeth Santorella, May 27 '16 at 11:01
Check out "maximum likelihood estimators for a truncated distribution": http://stats.stackexchange.com/questions/48897/maximum-likelihood-estimators-for-a-truncated-distribution — Elizabeth Santorella, May 27 '16 at 11:05

score 0 · Accepted Answer · answered May 27 '16 at 11:11

Just for reference, I found a solution using a MLE approach:

fit_trunc <- lm(y~x, data=dat_trunc)
myfun <- function(z, a, b, sigma) 1 - pnorm(z*treshold, mean=a+b*z, sd=sigma)
ll <- function(par, dat) {
    a <- par[[1]]
    b <- par[[2]]
    sigma <- par[[3]]
    x0 <- min(dat[,"x"])
    x1 <- max(dat[, "x"])
    denom <- try(integrate(myfun, lower=x0, upper=x1, a=a, b=b, sigma=sigma)[["value"]]/(x1-x0))
    if(inherits(denom, "try-error")) browser()
    ret <- -(sum(dnorm(dat[,"y"], mean=a+b*dat[,"x"], sd=sigma, log=TRUE))
        - (nrow(dat)*log(denom)))
    return(ret)
}
fit_mle <- optim(par=c(coef(fit_trunc), sd(residuals(fit_trunc))), fn=ll, dat = dat_trunc)

The fit is not that bad:

fit_mle$par
# -0.2637355   5.1904350  83.5444319

Not sure if the denominator is calculated correctly, and what I need to do if x is not uniformly distributed.

Estimate distribution from truncated data

1 Answers1