How can one construct likelihood function to fit probability distribution when some data is below detection limit?

Question

I'm trying to estimate using MLE the parameters of a (Gamma) probability distribution given some data, but the data contain a good number of samples below the minimum detection limit (MDL) of the measurement method. These show up as zeroes in the dataset. I came up with a piecewise likelihood function where all 0s are assigned the mean probability density over [0,MDL], i.e. CDF(MDL)/MDL. And then all values above the MDL just have their likelihood calculated as normal.

Questions:

Is there anything WRONG with this approach? Are there better approaches?
Are there any non-obvious situations where it will fail to reliably estimate the parameters?
Has this been done in the literature before? If so, does somebody have a citation I can use?

Reprex below...

x_actual <- rgamma(100, 1,1)
MDL <- .25
x_with_0s <- x_actual
x_with_0s[x_actual<MDL] <- 0

likfunc <- function(k, theta){
  mean_lik_below_MDL <- pgamma(MDL, k, theta)/MDL
  liks <- log(dgamma(x, k, theta))
  liks[x==0] <- log(mean_lik_below_MDL)
  loglik <- -sum(liks)
  return(loglik)
}

This seemingly returns pretty good parameter estimates:

> x <- x_actual

> mle(likfunc, start=list(k=.5, theta=.5), method = "L-BFGS-B", lower = c(0,0))

Call:
mle(minuslogl = likfunc, start = list(k = 0.5, theta = 0.5), 
    method = "L-BFGS-B", lower = c(0, 0))

Coefficients:
       k    theta 
1.082315 1.077583 

> x <- x_with_0s

> mle(likfunc, start=list(k=.5, theta=.5), method = "L-BFGS-B", lower = c(0,0))

Call:
mle(minuslogl = likfunc, start = list(k = 0.5, theta = 0.5), 
    method = "L-BFGS-B", lower = c(0, 0))

Coefficients:
       k    theta 
1.031814 1.029893 
>

One thing I don't like is that there's a discontinuity in the PDF for the likelihood function, but I don't know if this is really an issue.

x <- seq(0,5, .01)
pdf <- function(k, theta){
  mean_lik_below_MDL <- pgamma(MDL, k, theta)/MDL
  liks <- dgamma(x, k, theta)
  liks[x<MDL] <- mean_lik_below_MDL
  return(liks)
}
plot(pdf(1,1)~x)

Related: https://stats.stackexchange.com/questions/354671/fitting-distributions-on-censored-data. — JimB, Mar 06 '20 at 04:18

Jose Pliego · Answer 1 · 2020-03-05T00:08:18.270

Your data is left-censored because you only know these measurements have a value below MDL. So, using Survival Analysis to incorporate left-censored data into the likelihood function you would have to replace all zeros for the MDL, and because you only know that $X < MDL$, their contribution to the likelihood is $F(MDL)$.

The code could be something like

set.seed(1996)

x_actual <- rgamma(100, 1,1)
MDL <- .25
x_censored <- x_actual
x_censored[x_actual<MDL] <- MDL

dt <- data.frame(x_obs = x_censored)
dt$censored <- ifelse(dt$x_obs == MDL, 1, 0)

likfunc <- function(k, theta){
  liks <- log(dgamma(x$x_obs, k, theta))
  liks[x$censored == 1] <- log(pgamma(x[x$censored == 1, 1], k, theta))
  loglik <- -sum(liks)
  return(loglik)
}

This yields the estimates

> x <- data.frame(x_obs = x_actual, censored = rep(0, length(x_actual)))
> 
> mle(likfunc, start=list(k=.5, theta=.5), method = "L-BFGS-B", lower = c(0,0))

Call:
mle(minuslogl = likfunc, start = list(k = 0.5, theta = 0.5), 
    method = "L-BFGS-B", lower = c(0, 0))

Coefficients:
       k    theta 
1.069465 1.275040 
> 
> x <- dt
> 
> mle(likfunc, start=list(k=.5, theta=.5), method = "L-BFGS-B", lower = c(0,0))

Call:
mle(minuslogl = likfunc, start = list(k = 0.5, theta = 0.5), 
    method = "L-BFGS-B", lower = c(0, 0))

Coefficients:
       k    theta 
1.045510 1.247177

Klein's Survival Analysis explains this contribution to the likelihood function.

Survival analysis typically concerns *right-censored* data, although in various ways it can be pressed into service for analyzing left-censored data. But it does not specifically deal with truncated data--nor is the present question about truncated data. It's also not about replacing the censored data with zeros. A closer consideration of the question is advisable. — whuber, Mar 04 '20 at 23:21
You are right, I apologize. I've edited my answer to show how to incorporate left-censored data in the likelihood function. I think that this approach with Survival Analysis has the theoretical foundations you need. — Jose Pliego, Mar 05 '20 at 00:11

JimB · Answer 2 · 2020-03-06T01:51:14.833

What you want to use for the likelihood is the following. Suppose $f(x)$ is the density function and $F(x)$ is the cumulative distribution function at $x$. If from a sample of size $n$ with a threshold of $x_0$, there are $n_0$ below that threshold and $n-n_0$ at or above $x_0$, the likelihood is

$$L=F(x_0)^{n_0} \times \prod_{i=1}^{n-n_0} f(x_i)$$

where $x_1, x_2,\ldots,x_{n-n_0}$ are the values at or above the threshold. As stated by @Ben-ReinstateMonica in a comment, the log of the likelihood is what is typically maximized because of better numeric stability:

$$\ln{L}=n_0 \ln{F(x_0)}+\sum_{i=1}^{n-n_0}\ln f(x_i)$$

Note that the R function mle in the stats4 package minimizes $-\ln{L}$ rather than maximizing $\ln{L}$ which can cause a bit of confusion. (I'm guessing the reason for that is that mle uses the optim function which by default minimizes an objective function.)

That can be implemented for your example as

x_actual <- rgamma(100, 1, 1)
MDL <- 0.25
nCensored <- sum(x_actual < MDL)
xNotCensored <- x_actual[x_actual >= MDL]

library(stats4)

likfunc <- function(k, theta){
  -nCensored*log(pgamma(MDL, k, theta)) - sum(log(dgamma(xNotCensored, k, theta)))
}

(results <- mle(likfunc, start=list(k=.5, theta=.5), method = "L-BFGS-B", lower = c(0,0)))

Call:
mle(minuslogl = likfunc, start = list(k = 0.5, theta = 0.5), 
    method = "L-BFGS-B", lower = c(0, 0))

Coefficients:
        k     theta 
1.0068535 0.9319662

Saving the results in an object gets you the other essential piece for an estimate: a measure of precision (here in the form of a standard error).

summary(results)
Maximum likelihood estimation

Call:
mle(minuslogl = likfunc, start = list(k = 0.5, theta = 0.5), 
    method = "L-BFGS-B", lower = c(0, 0))

Coefficients:
       Estimate Std. Error
k     0.8912489  0.1402007
theta 0.9177082  0.1721334

-2 log L: 269.0074

You can also get the estimated covariance matrix and correlation between the maximum likelihood estimators of k and theta with the following:

# Covariance matrix
  results@vcov
               k      theta
k     0.01965624 0.01991023
theta 0.01991023 0.02962990

# Correlation between estimators
results@vcov[1,2]/sqrt(results@vcov[1,1]*results@vcov[2,2])
[1] 0.8250135

In general, suppose you had "complete" censoring in the sense that all of the data is binned into $b$ bins with boundaries $x_0,x_1,\ldots,x_b$ and associated freqencies $n_1,n_2,\ldots,n_b$. The likelihood function is

$$L=\prod_{i=1}^{b} (F(x_i)-F(x_{i-1}))^{n_i}$$

with the log of the likelihood being

$$\ln L=\sum_{i=1}^{b} n_i \ln(F(x_i)-F(x_{i-1}))$$

The maximum likelihood estimates would be the values of the parameters that maximize $\ln L$.

Good answer (+1). May I suggest you add an additional equation to show the corresponding form of the log-likelihood $\ell = \ln L$, since this function is often the one that is maximised in the MLE procedure (for numerical stability reasons). — Ben, Mar 05 '20 at 00:35
@Ben-ReinstateMonica. Thanks. Good idea. I've added that and a blurb about including a measure of precision. — JimB, Mar 06 '20 at 00:36
OP here: So it would appear to me, then, that the only difference between my approach and the two answers here is that, for the truncated data, I assign a probability of F(x0)/x0 (a probability density, specifically the mean probability density of x < x0) , whereas you give the probability as the probability mass F(x0). In your example, how can you multiply both probability masses (for x < x0) and probability densities (for x > x0) into a coherent likelihood for for all x? — user278411, Mar 06 '20 at 00:54
This doesn't invalidate your argument but $F(x_0)/x_0$ is not the mean probability density given that $x — JimB, Mar 06 '20 at 01:38

How can one construct likelihood function to fit probability distribution when some data is below detection limit?

2 Answers2