I'm trying to run a zero-inflated regression for a continuous response variable in R. I'm aware of a gamlss implementation, but I'd really like to try out this algorithm by Dale McLerran that is conceptually a bit more straightforward. Unfortunately, the code is in SAS and I'm not sure how to re-write it for something like nlme.
The code is as follows:
proc nlmixed data=mydata;
parms b0_f=0 b1_f=0
b0_h=0 b1_h=0
log_theta=0;
eta_f = b0_f + b1_f*x1 ;
p_yEQ0 = 1 / (1 + exp(-eta_f));
eta_h = b0_h + b1_h*x1;
mu = exp(eta_h);
theta = exp(log_theta);
r = mu/theta;
if y=0 then
ll = log(p_yEQ0);
else
ll = log(1 - p_yEQ0)
- lgamma(theta) + (theta-1)*log(y) - theta*log(r) - y/r;
model y ~ general(ll);
predict (1 - p_yEQ0)*mu out=expect_zig;
predict r out=shape;
estimate "scale" theta;
run;
From: http://listserv.uga.edu/cgi-bin/wa?A2=ind0805A&L=sas-l&P=R20779
ADD:
Note: There are no mixed effects present here - only fixed.
The advantage to this fitting is that (even though the coefficients are the same as if you separately fit a logistic regression to P(y=0) and a gamma error regression with log link to E(y | y>0)) you can estimate the combined function E(y) which includes the zeroes. One can predict this value in SAS (with a CI) using the line predict (1 - p_yEQ0)*mu
.
Further, one is able to write custom contrast statements to test the significance of predictor variables on E(y). For example, here is another version of the SAS code I have used:
proc nlmixed data=TestZIG;
parms b0_f=0 b1_f=0 b2_f=0 b3_f=0
b0_h=0 b1_h=0 b2_h=0 b3_h=0
log_theta=0;
if gifts = 1 then x1=1; else x1 =0;
if gifts = 2 then x2=1; else x2 =0;
if gifts = 3 then x3=1; else x3 =0;
eta_f = b0_f + b1_f*x1 + b2_f*x2 + b3_f*x3;
p_yEQ0 = 1 / (1 + exp(-eta_f));
eta_h = b0_h + b1_h*x1 + b2_h*x2 + b3_h*x3;
mu = exp(eta_h);
theta = exp(log_theta);
r = mu/theta;
if amount=0 then
ll = log(p_yEQ0);
else
ll = log(1 - p_yEQ0)
- lgamma(theta) + (theta-1)*log(amount) - theta*log(r) - amount/r;
model amount ~ general(ll);
predict (1 - p_yEQ0)*mu out=expect_zig;
estimate "scale" theta;
run;
Then to estimate "gift1" versus "gift2" (b1 versus b2) we can write this estimate statement:
estimate "gift1 versus gift 2"
(1-(1 / (1 + exp(-b0_f -b1_f))))*(exp(b0_h + b1_h)) - (1-(1 / (1 + exp(-b0_f -b2_f))))*(exp(b0_h + b2_h)) ;
Can R do this?