If $Y_{pre}$ and $Y_{post}$ are, for instance, lab analytes where the assay can only detect down to a lower limit of detection, you can use an EM algorithm to impute the values according to some distributional assumption.
To illustrate with a univariate case: consider first the M step. To calculate the likelihood, an order statistic result is to just input the CDF functional. For instance:
set.seed(123)
n <- 100
ypre <- rlnorm(n = n, meanlog = -3, sdlog = 3)
ypre[ypre < 1] <- 1
negloglik <- function(param, x, cens) {
-sum(pnorm(x[cens], mean = param[1], sd=param[2], log.p = T)) -
sum(dnorm(x[!cens], mean=param[1], sd=param[2], log = T))
}
nlm(f = negloglik, p=c(0,1), x=log(ypre), cens = ypre==1)
Gives (with lots of warnings because the nlm
algo is a bit rough)
$estimate
[1] -2.509127 2.642844
Which we can then apply expectation (E-step) based on the conditional expectation
paramest <- nlm(f = negloglik, p=c(0,1), x=log(ypre), cens = ypre==1)$estimate
pcens <- mean(ypre == 1)
expval <- integrate(function(x) x*dnorm(x, mean=paramest[1], sd = paramest[2]),
lower=-Inf, upper=qnorm(pcens))
ypre_e <- ifelse(ypre==1, exp(expval), ypre)
Now this is considerably more complicated with multivariate analyses. If we are able to assume normality for the $Y$s (or some transformation), we can use linear regression as a maximum likelihood process. We can factor the joint likelihood of Ypre, X, and Ypost into the likelihood of Ypre and of Ypost|X, Ypre. I think we can thus develop the algorithm:
- Initialize parameter estimates for Ypre, the regression parameters for the mean Ypost response adjusting for Ypre and treatment, and the residual standard error of Ypost
- Calculate the expected value of Ypre
- Estimate the regression using a maximum likelihood procedure with the conditional mean of Ypost given by the current regression estimates and current residual variance
- Update the regression estimates and residual variance according to the MLE.
- Repeat steps 3 and 4 until convergence.