I am trying to calculate the log-likelihood of some time series data given parameter sets estimated in BUGS. I can not figure out how to handle some missing values at random points in time.
For the complete data situation, such as $Y=(0.1,0.3,0.5,0.4,0.2,0.1)$, (real data is much longer) I have fitted a time series model assuming errors are normally distributed. For example, my BUGS code is something like:
for(t in 2:6){
y[t] ~ dnorm(y.mean[t], tau)
y.mean[t] <- phi0 + phi1*y[t-1]
}
i.e., the data is assumed to follow a normal distribution: $y_t \sim N(\phi_0+\phi_1 y_{t-1}, \sigma^2), 2<t<6$, where $\sigma$ is the standard deviation to the tolerance tau
in the BUGS code. In R I can derive the log-likelihood of data,
$l(y_t|\phi_0,\phi_1,\sigma,y_{t-1})=\sum_{t=2}^{t=6}P(Y_t=y_t)$
where $P(Y_t=y_t)$ is a normal probability density function, given a single MCMC sample of parameters (for example, $\phi_0=0.25$, $\phi_1=0.55$ and $\sigma=0.35$) as such:
> y <-c(0.1,0.3,0.5,0.4,0.2,0.1)
> phi0 <- 0.25
> phi1 <- 0.55
> sigma <- 0.35
>
> ymean <- phi0+phi1*y[1:5]
> ll <- sum(dnorm(y[2:6], mean = ymean , sd = sigma, log = TRUE))
> ll
[1] -0.01241878
However, I am stuck when it comes to performing the correct calculation of the log-likelihood when there are missing data, say $Y=(0.1,0.3,0.5,NA,0.2,0.1)$ and $NA$ is missing? I believe that y[4]
has to dropped in the R code/likelihood calculation. I am not sure how (or if) to estimate ymean[5]
, which is dependent on a missing $y_4$? BUGS of course provides a MCMC sample(s) for this missing data point, but should I use it, or do I keep the R code as is, adjusting for NA
in the ymean[5]
with na.rm=TRUE
when summing over probability density functions:
> y[4]<-NA
> ymean<-phi0+phi1*y[1:5]
> ymean
[1] 0.305 0.415 0.525 NA 0.360
> ll <- sum(dnorm(y[2:6], mean = ymean , sd = sigma, log = TRUE), na.rm=TRUE)
> ll
[1] 0.08714057