3

I have some data for numbers that are positive and these numbers are capped above at C (so if a sample from my data should have exceeded C, the data generating process will simply return C).

I know which of the samples were capped and which ones were not.

I am happy to assume that these data points are lognormally distributed and would like to estimate the parameters of the uncapped lognormal distribution. How do I use my uncapped data and apply MLE to achieve this? If lognormal is too hard for some reason, I am open to using normal or other distributions.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
ryu576
  • 2,220
  • 1
  • 16
  • 25
  • You can just write a loglikelihood function and optimize it numerically. – kjetil b halvorsen Oct 14 '15 at 02:48
  • 1. Take logs. 2. Fit a truncated normal (truncated at $\log(C)$). The parameters are now ML for your lognormal. – Glen_b Oct 14 '15 at 06:37
  • There's an algorithm outlined [here](http://stats.stackexchange.com/a/48909/805) that's suitable for the normal case and my previous comment gets you the lognormal from that. – Glen_b Oct 14 '15 at 06:46
  • 1
    @Glen_b That algorithm is for right truncated data which is not the same as right censored data. – Jarle Tufto Jul 29 '17 at 22:12
  • @Jarle you're correct, I was responding to the title (which previously said "truncated), but the description is indeed of a censored problem. I should have typed *censored* in my first comment and the one at the link doesn't apply either, as you say. Thanks for pointing this out. – Glen_b Jul 30 '17 at 01:02

1 Answers1

3

Since you know that the event $y>C$ occurred for some observations you have right censored data. If such events instead went unrecorded, the data would have been right truncated. Fitting a lognormal, intercept only, survival regression model as follows gives you MLEs of the location and scale parameter of the lognormal:

> library(survival)
> y <- rlnorm(1000, 0, 1) # Simulated data with location 0 and scale 1
> delta <- y<2 # right censoring at C = 2
> y[!delta] <- 2
> survreg(Surv(y, delta) ~ 1, dist="lognormal")
Call:
survreg(formula = Surv(y, delta) ~ 1, dist = "lognormal")

Coefficients:
 (Intercept) 
-0.008081718 

Scale= 0.9564241 

Loglik(model)= -978.1   Loglik(intercept only)= -978.1
n= 1000 
Jarle Tufto
  • 7,989
  • 1
  • 20
  • 36