Calculating the likelihood of time series data when there are missing data

Question

I am trying to calculate the log-likelihood of some time series data given parameter sets estimated in BUGS. I can not figure out how to handle some missing values at random points in time.

For the complete data situation, such as $Y=(0.1,0.3,0.5,0.4,0.2,0.1)$, (real data is much longer) I have fitted a time series model assuming errors are normally distributed. For example, my BUGS code is something like:

for(t in 2:6){
    y[t] ~ dnorm(y.mean[t], tau)
    y.mean[t] <- phi0 + phi1*y[t-1]
}

i.e., the data is assumed to follow a normal distribution: $y_t \sim N(\phi_0+\phi_1 y_{t-1}, \sigma^2), 2<t<6$, where $\sigma$ is the standard deviation to the tolerance tau in the BUGS code. In R I can derive the log-likelihood of data,

$l(y_t|\phi_0,\phi_1,\sigma,y_{t-1})=\sum_{t=2}^{t=6}P(Y_t=y_t)$

where $P(Y_t=y_t)$ is a normal probability density function, given a single MCMC sample of parameters (for example, $\phi_0=0.25$, $\phi_1=0.55$ and $\sigma=0.35$) as such:

> y <-c(0.1,0.3,0.5,0.4,0.2,0.1)
> phi0 <- 0.25
> phi1 <- 0.55
> sigma <- 0.35
> 
> ymean <- phi0+phi1*y[1:5]
> ll <- sum(dnorm(y[2:6], mean = ymean , sd = sigma, log = TRUE))
> ll
[1] -0.01241878

However, I am stuck when it comes to performing the correct calculation of the log-likelihood when there are missing data, say $Y=(0.1,0.3,0.5,NA,0.2,0.1)$ and $NA$ is missing? I believe that y[4] has to dropped in the R code/likelihood calculation. I am not sure how (or if) to estimate ymean[5], which is dependent on a missing $y_4$? BUGS of course provides a MCMC sample(s) for this missing data point, but should I use it, or do I keep the R code as is, adjusting for NA in the ymean[5] with na.rm=TRUE when summing over probability density functions:

> y[4]<-NA
> ymean<-phi0+phi1*y[1:5]
> ymean
[1] 0.305 0.415 0.525    NA 0.360
> ll <- sum(dnorm(y[2:6], mean = ymean , sd = sigma, log = TRUE), na.rm=TRUE)
> ll
[1] 0.08714057

score 3 · Accepted Answer · edited Apr 13 '17 at 12:44

3

Why do you need to compute log-likelihood? Bugs computes MCMC samples (i.e., posterior distribution) of all parameters you want, and that should be all you need for bayesian inference.

If you have missing values of response variable, it will not bring new information to the model, you will only gain predictions for corresponding values of explanatory variables. So you can omit those from the computation. But if you need those missing values predicted by BUGS, I would safely include them in the computation. For more information, look here and here.

edited Apr 13 '17 at 12:44

Community

1

answered Jan 16 '13 at 23:24

Tomas

5,735
11
52
93

I need to compute the log-likelihood to estimate the normalizing constant (using the bridge sampler). – guyabel Jan 17 '13 at 09:13
OK, than the second paragraph is answer to your question - if you have missing data, these can of course not affect your calculation, so you can omit them. – Tomas Jan 17 '13 at 09:16
yep, thanks @Thomas. I am still a bit unclear about how to calculate the ymean[5], which depends on a missing value. Would I run the R code as is (see edit at end of question above) – guyabel Jan 17 '13 at 09:23
Sorry @gjabel, I misunderstood your question! Then, I would safely include the values predicted by BUGS to the computation, if you need those values. See my edit. – Tomas Jan 17 '13 at 09:32
Thanks again, sorry to push you, but what is the justification for including the posterior values of $y_4$? I thought the likelihood function was equal to the probability of the observed data, given the parameter values...$l(Y|\phi_0,\phi_1,\sigma)$. Do the posterior values of $y_4$ generated by BUGS qualify as parameters...$l(Y|\phi_0,\phi_1,\sigma,y_4)$? – guyabel Jan 17 '13 at 09:51
@gjabel, I say only: "if you NEED the missing value in computation, use the predicted one, since it will not affect your model/computation". If you don't need it, just omit it. That's all. As for your question, y4 is still part of Y... – Tomas Jan 17 '13 at 10:04

Calculating the likelihood of time series data when there are missing data

1 Answers1