2

I'm trying to replicate Table 13.8 from Fitzmaurice, Laird, & Ware (2011) using R for teaching purposes. This is a GEE count model of the number of bacteria on 30 patients at two waves. In their SAS-estimated results, they get a correlation between the two waves of 0.797. My "geepack" results in R, however, give me a 0 correlation. (And I get a 0 correlation when I try other count data.)

Any ideas as to how I can get a result more like their SAS results? Effectively I'm just looking for a positive estimated correlation.

My coefficients are similar in scale to theirs. My full script is posted here, with the COUNT EXAMPLE halfway down being of interest. Here's what I get when I plug the relevant lines of code into R. Notice the "alpha" param:

 library(geepack)
 library(reshape)
 library(nlme)
 library(car)
 library(MuMIn)
 leprosy<-read.table("http://spia.uga.edu/faculty_pages/monogan/teaching/pd/leprosy.txt", header=TRUE, sep="")
 leprosy$id<-c(1:nrow(leprosy))
     leprosy$drug<-relevel(leprosy$drug,"C")
     leprosy$antibiotic<-1-as.numeric(leprosy$drug=="C")
     m.leprosy<-melt.data.frame(data=leprosy, measure.vars=c("pre","post"), id=c("id","drug","antibiotic"))
     m.leprosy$time<-as.numeric(m.leprosy$variable=="post")
     m.leprosy$a<-as.numeric(m.leprosy$time==1 & m.leprosy$drug=="A")
 m.leprosy$b<-as.numeric(m.leprosy$time==1 & m.leprosy$drug=="B")
     m.leprosy$treat<-as.numeric(m.leprosy$time==1 & m.leprosy$antibiotic==1)
 mod.3<-geeglm(value~time+a+b, id=id, waves=time, family=poisson(link="log"), data=m.leprosy,corstr="exchangeable")
 summary(mod.3)

Call:
geeglm(formula = value ~ time + a + b, family = poisson(link = "log"), 
    data = m.leprosy, id = id, waves = time, corstr = "exchangeable")

 Coefficients:
            Estimate Std.err   Wald Pr(>|W|)    
(Intercept)   2.3734  0.0801 877.10   <2e-16 ***
time          0.1362  0.1919   0.50   0.4778    
a            -0.8419  0.3155   7.12   0.0076 ** 
b            -0.7013  0.3493   4.03   0.0447 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Estimated Scale Parameters:
            Estimate Std.err
(Intercept)      3.2   0.454

Correlation: Structure = exchangeable  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha        0       0
Number of clusters:   60   Maximum cluster size: 1 

I tend to believe Fitzmaurice, Laird, & Ware's results of a positive correlation. Any help would be very much appreciated!

1 Answers1

1

The data are not ordered according to id and time

head(m.leprosy,2)

id drug antibiotic variable value time a b treat
1  1    A          1      pre    11    0 0 0     0
2  2    B          1      pre     6    0 0 0     0

The fit

mod.3<-geeglm(value~time+ a + b, id=id, waves=time, family=poisson(link="log"), 

data = m.leprosy,corstr = "exchangeable")

will be wrong (see the vignette to the package)

For the application of geeglm the data must be ordered

mOrdered<-with(m.leprosy,m.leprosy[order(id,time),])

mod.3Ordered<-geeglm(value ~ time + a + b, id = id, waves = time, 
    family = poisson(link = "log"), data=mOrdered, corstr = "exchangeable")


Call:
geeglm(formula = value ~ time + a + b, family = poisson(link = "log"), 
data = mOrdered, id = id, waves = time, corstr = "exchangeable")

Coefficients:
(Intercept)           a           b 
 2.3728     -0.5654     -0.4981 

Degrees of Freedom: 60 Total (i.e. Null);  57 Residual

Scale Link:                   identity
Estimated Scale Parameters:  [1] 3.212

Correlation:  Structure = exchangeable    Link = identity 
Estimated Correlation Parameters:
 alpha 
0.7383 

Number of clusters:   30   Maximum cluster size: 2 

I should be noted that Stata (version 14) gives the answer in agreement with geeglm

quiet xtset id
xtgee value time a b, link(log) family(poisson) corr(exchangeable) 
vce(robust)

estat wcorrelation

Estimated within-id correlation matrix R:

      |        c1         c2 
------+----------------------
   r1 |         1            
   r2 |   .738365          1 
  • Also note that the output from the (original) `geeglm` fit also shows that the data is in the wrong format. The last line says `Number of clusters: 60 Maximum cluster size: 1’. – Karl Ove Hufthammer Feb 21 '16 at 21:26