8

I'm trying to use a zero-inflated gamma model (or a gamma 'hurdle' model). The model is a mixture of logistic regression and generalized linear modeling. I can do this analysis in two steps: 1) do a logistic regression against presence/absence data and then 2) Use a generalized linear model with a gamma distribution on the positive values. I'm trying to set the model up where, if y = 0, then E(y) = p, but if y > 0, then E(y) is gamma distributed.

I'm trying to set this up in BUGS/JAGS, because I've seen these models worked before for poisson-distributions. I tried to adapt the code from that link into a gamma distribution, but I can't quite seem to get it to work. I know the code won't work because my data have zeroes, and the likelihood function can't be evaluated with zeroes. However, even when restricted to positive values, I get errors for invalid parent values. Even so, I'm not even sure the model is specified correctly.

Here is the model:

# For the ones trick
C <- 10000

# for every observation
for(i in 1:N){
    # log-likelihood of the observation from the gamma likelihood
    LogPos[i] <- (shape[i] - 1)*log(y[i]) - y[i]/scale[i] - N*shape[i]*log(scale[i]) - N*loggam(scale[i])
    #likelihood
    Lpos[i] <- exp(LogPos[i])

    # redefine the shape and rate parameters as a function of the mean and sd
    shape[i] <- pow(mu[i], 2) / pow(sd, 2)
    scale[i] <- mu[i] / pow(sd, 2)

    # mu is a function of MTD: use the inverse link
    #mu[i] <- 1/eta[i]
    mu[i] <- beta0 + beta1*MTD[i]


    # zero-inflated part, where w[i] is the probability of being zero
    w[i] <- exp(zeta[i]) / (1 + exp(zeta[i]))
    zeta[i] <- gamma0 + gamma1*MPD[i]

    # ones trick
    p[i] <- Lpos[i] / C
    ones[i] ~ dbern(p[i])

    # Full likelihood
    Lik[i] <- Lpos[i]*(1 - w[i]) + equals(y[i], 0)*w[i]
  } 

# PRIORS
beta0 ~ dnorm(0, 0.001)
beta1 ~ dnorm(0, 0.001)

gamma0 ~ dnorm(0, 0.001)
gamma1 ~ dnorm(0, 0.001)

sd ~ dunif(0, 100)

Has anyone set a model up like this or have any advice on how to set it up correctly?

UPDATE

I've tried a new set of code that's similar, but slight different. I still have not gotten it to work

model{

  # For the ones trick
  C <- 10000

  # for every observation
  for(i in 1:N){

    # make a dummy variable that is 0 if y is < 0.0001 and 1 if y > 0.0001. This is essentially a presence
    # absence dummy variable
    z[i] <- step(y[i] - 0.0001)

    # define the logistic regression model, where w is the probability of occurance.
    # use the logistic transformation exp(z)/(1 + exp(z)), where z is a linear function
    w[i] <- exp(zeta[i]) / (1 + exp(zeta[i]))
    zeta[i] <- gamma0 + gamma1*MPD[i]

    # define the gamma regression model for the mean. use the log link the ensure positive, non-zero mu
    mu[i] <- exp(eta[i])
    eta[i] <- beta0 + beta1*MTD[i]

    # redefine the mu and sd of the continuous part into the shape and scale parameters
    shape[i] <- pow(mu[i], 2) / pow(sd, 2)
    scale[i] <- mu[i] / pow(sd, 2)

    # for readability, define the log-likelihood of the gamma here
    logGamma[i] <- (shape[i] - 1)*log(y[i]) - y[i]/scale[i] - shape[i]*log(scale[i]) - loggam(scale[i])

    # define the total likelihood, where the likelihood is (1 - w) if y < 0.0001 (z = 0) or
    # the likelihood is w * gammalik if y >= 0.0001 (z = 1). So if z = 1, then the first bit must be
    # 0 and the second bit 1. Use 1 - z, which is 0 if y > 0.0001 and 1 if y < 0.0001
    logLik[i] <- (1 - z[i]) * log(1 - w[i]) + z[i] * ( log(w[i]) + logGamma[i] )

    # Use the ones trick
    p[i] <- logLik[i] / C
    ones[i] ~ dbern(p[i])
  } 

  # PRIORS
  beta0 ~ dnorm(0, 0.001)
  beta1 ~ dnorm(0, 0.001)

  gamma0 ~ dnorm(0, 0.001)
  gamma1 ~ dnorm(0, 0.001)

  sd ~ dgamma(1, 2)

}

UPDATE 2

I've gotten it to run by deleting the definition of logGamma[i] and putting it directly into the likelihood function, which now reads:

logLik[i] <- (1 - z[i]) * log(1 - w[i]) + z[i] * ( log(w[i]) + (shape[i] - 1)*log(y[i]) - y[i]/scale[i] - shape[i]*log(scale[i]) - loggam(scale[i]) )

The problem was that JAGS was trying to evaluate the log likelihood at all observations first, resulting in NA's for the 0's. In the new way, it only evaluates it if z = 1 (I think). I'm still having trouble getting the parameter estimates to line up. For example, the gamma's are almost identical to a logistic regression of the same form (hooray!). But the betas are pretty far off from a gamma GLM of the positive values. I don't know if this is normal or not, but I suspect there are still problems with my model specification.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Nate
  • 768
  • 5
  • 10
  • Three observations. First, you don't want N in `N*shape[i]` or `N*loggam(...)` in the `LogPos[i]` expression, because this is the log likelihood for a single observation, and the `N` is appropriate only if it's for all `N` observations. Second, with `sd` having a range of $[0,100]$, you're going to have problems. I'd suggest, if you must go uniform, bounding it a) away from zero and b) over a much tighter range, see if the posterior piles up at one end or the other of the range, and adjust if necessary. Third, I think (at a quick glance) you want `p[i] – jbowman Sep 29 '13 at 21:35
  • Yep you're right. Thanks for the comments. I made the changes, including giving sd a gamma prior, and still get invalid parent values. I'm not sure what's going on. I know my code isn't right, I'm just a bit stuck – Nate Sep 29 '13 at 23:45
  • A few more thoughts: I assume you have `ones` defined somewhere not in the visible code? And is `C` large enough? (It looks large enough to me, but that's a model-specific thing.) I also note that it's possible for `mu` to equal zero or be negative, which would be bad, because of how `scale` is defined, and I wouldn't be surprised if the starting values for `beta0` and `beta1` were 0 since it's the prior mean. – jbowman Sep 30 '13 at 00:00
  • I've been messing with some of this already. I provided initial starts that aren't 0 and changed it to use a log-link function for mu, where mu[i] – Nate Sep 30 '13 at 01:00
  • That shape parameter on the `sd~dgamma(1,2)` maybe should be a little higher just to keep the density away from zero... I also think you have to put `z[i] – jbowman Sep 30 '13 at 01:42

1 Answers1

4

I figured out the answer, with help from Martyn Plummer. My code uses the inverse link for the gamma model (and no inverse of the predictors). Also, this code requires the 'glm' module for JAGS.

model{

  # For the ones trick
  C <- 10000

  # for every observation
  for(i in 1:N){

    # define the logistic regression model, where w is the probability of occurance.
    # use the logistic transformation exp(z)/(1 + exp(z)), where z is a linear function
    logit(w[i]) <- zeta[i]
    zeta[i] <- gamma0 + gamma1*MPD[i] + gamma2*MTD[i] + gamma3*int[i] + gamma4*MPD[i]*int[i] + gamma5*MTD[i]*int[i]

    # define the gamma regression model for the mean. use the log link the ensure positive, non-zero mu
    mu[i] <- pow(eta[i], -1)
    eta[i] <- beta0 + beta1*MPD[i] + beta2*MTD[i] + beta3*int[i] + beta4*MPD[i]*int[i] + beta5*MTD[i]*int[i]

    # redefine the mu and sd of the continuous part into the shape and scale parameters
    shape[i] <- pow(mu[i], 2) / pow(sd, 2)
    rate[i] <- mu[i] / pow(sd, 2)

    # for readability, define the log-likelihood of the gamma here
    logGamma[i] <- log(dgamma(y[i], shape[i], rate[i]))

    # define the total likelihood, where the likelihood is (1 - w) if y < 0.0001 (z = 0) or
    # the likelihood is w * gammalik if y >= 0.0001 (z = 1). So if z = 1, then the first bit must be
    # 0 and the second bit 1. Use 1 - z, which is 0 if y > 0.0001 and 1 if y < 0.0001
    logLik[i] <- (1 - z[i]) * log(1 - w[i]) + z[i] * ( log(w[i]) + logGamma[i] )

    Lik[i] <- exp(logLik[i])

    # Use the ones trick
    p[i] <- Lik[i] / C
    ones[i] ~ dbern(p[i])
  }

  # PRIORS
  beta0 ~ dnorm(0, 0.0001)
  beta1 ~ dnorm(0, 0.0001)
  beta2 ~ dnorm(0, 0.0001)
  beta3 ~ dnorm(0, 0.0001)
  beta4 ~ dnorm(0, 0.0001)
  beta5 ~ dnorm(0, 0.0001)

  gamma0 ~ dnorm(0, 0.0001)
  gamma1 ~ dnorm(0, 0.0001)
  gamma2 ~ dnorm(0, 0.0001)
  gamma3 ~ dnorm(0, 0.0001)
  gamma4 ~ dnorm(0, 0.0001)
  gamma5 ~ dnorm(0, 0.0001)

  sd ~ dgamma(2, 2)

}
Nate
  • 768
  • 5
  • 10