10

Gelman & Hill (2006) say:

In Bugs, missing outcomes in a regression can be handled easily by simply including the data vector, NA’s and all. Bugs explicitly models the outcome variable, and so it is trivial to use this model to, in effect, impute missing values at each iteration.

This sounds like an easy way to use JAGS to do prediction. But do the observations with the missing outcomes also affect parameter estimates? If so, is there an easy way to keep these observations in the dataset that JAGS sees, but to not have them affect the parameter estimates? I was thinking about the cut function, but that's only available in BUGS, not JAGS.

Jack Tanner
  • 4,552
  • 3
  • 27
  • 39

1 Answers1

11

Yes, it is really easy to use in BUGS or JAGS! It is actually a pleasure to use it!

But do the observations with the missing outcomes also affect parameter estimates?

Of course not. The parameters are only affected by the observed outcomes. The missing outcomes (NAs) will not affect anything, actually it is the other way: the missing outcomes will be derived from the parameters. Note that the missing outcomes will have its posterior distribution also. Then it is very easy to compute some derived quantities e.g. like a sum over indices of the outcome, and these derived quantities not only are handled for missing values, but also immediatelly have their posterior distribution. That's what is so sexy on BUGS & JAGS!

Have fun!

Tomas
  • 5,735
  • 11
  • 52
  • 93
  • 1
    Sorry, I'm not convinced that missing outcomes don't affect parameter estimates. Jackman seems to say the opposite: http://jackman.stanford.edu/blog/?p=38 – Jack Tanner Feb 03 '12 at 04:39
  • @JackTanner, think about it a while. How can missing value affect something? As the algorithm starts, the missing value will start to be imputed from the parameter estimates (these are derived from observed outcomes). Then (maybe, I'm not sure), the information from the imputed missing outcome can bounce back to parameters, but it doesn't matter - it is just the original information, present in parameters, bounced back to them. The REAL information that affects something is coming only from the REAL outcomes. If you don't trust me, make a simulation, compare the results and post here. – Tomas Feb 03 '12 at 09:33
  • Regarding your link, he is apparently not sure about it, he says "problem" - in quotes, and he says "it would be interesting to compare it". I say there will be no significant difference. If you want to test it, go ahead. – Tomas Feb 03 '12 at 09:36
  • 3
    I agree; no significant difference. I use this approach for constructing posterior predictive distributions; just put the predictive values of the right hand side variables in along with the past values, and NAs for the target variable "observations" corresponding to the predictive values. – jbowman Feb 03 '12 at 19:40
  • @jbowman, yes, good note! Not an obvious idea to do predictions this way! – Tomas Feb 03 '12 at 21:55