23

I would like to know how the treatment of weights differs between svyglm and glm

I am using the twang package in R to create propensity scores which are then used as weights, as follows (this code comes from the twang documentation):

library(twang)
library(survey)
set.seed(1)

data(lalonde)

ps.lalonde <- ps(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75,
 data = lalonde)

lalonde$w <- get.weights(ps.lalonde, stop.method="es.mean")
design.ps <- svydesign(ids=~1, weights=~w, data=lalonde)

glm1 <- svyglm(re78 ~ treat, design=design.ps)

summary(glm1)

...
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   6685.2      374.4  17.853   <2e-16 ***
treat         -432.4      753.0  -0.574    0.566    

Compare this to:

glm11 <- glm(re78 ~ treat, weights=w , data=lalonde)
summary(glm11)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   6685.2      362.5  18.441   <2e-16 ***
treat         -432.4      586.1  -0.738    0.461  

So the parameter estimates are the same but the standard errors for the treatment are quite different.

How does the treatment of weights differ between svyglm and glm ?

Joe King
  • 3,024
  • 6
  • 32
  • 58

2 Answers2

13

survey computes the standard errors with consideration of the loss of precision introduced by sampling weights. Weights in glm simply adjust the weight given to the errors in the least squares estimation, so the standard errors aren't correct. Here's a selection from Lumley (2010):

In a model-based analysis it would be necessary to specify the random part of the model correctly to get correct standard errors, but all our standard error estimates are design-based and so are valid regardless of the model. It is worth noting that the “sandwich”, or “model-robust”, or “heteroskedasticity-consistent” standard errors sometimes used in model-based regression analysis are almost identical to the design-based standard errors we will use; the main difference being in the handling of stratification.

So without strata in your design, you will likely find that using sandwich will get you identical or near-identical SE estimates.

library(sandwich)
coefs <- vcovHC(glm11, type="HC0")
lmtest::coeftest(glm11,coefs)

In my test, they didn't compute out exactly when using "HC0" or "HC1", but were very close. svyglm is now reporting a z-value instead of t-value as well.

Mark White
  • 8,712
  • 4
  • 23
  • 61
commscho
  • 525
  • 4
  • 10
13

There are lots of different sorts of weights and they get kind of confusing. You have to be pretty careful when you're using different functions or software that you're using the kind of weights you think you're using.

The svyglm function uses survey weights - these weight the importance of each case to make them representative (to each other, after twang). I'm not sure what weight does in glm() - I think they represent the accuracy of the measures. (If you're using the binomial family, they have different meaning).

The survey weights (in surveyglm) are the weights that you want, to give you the correct standard errors.

(There are also frequency weights, analytic weights, and importance weights).

Jeremy Miles
  • 13,917
  • 6
  • 30
  • 64
  • (+1) thank you. do you know an accessible reference for the survey weights, other than the docs for `surveyglm`) ? – Joe King Apr 24 '13 at 21:52
  • 1
    I like Lumley's book: http://www.amazon.com/Complex-Surveys-Analysis-Series-Methodology/dp/0470284307/ref=sr_1_1?ie=UTF8&qid=1366914092&sr=8-1&keywords=R+survey – Jeremy Miles Apr 25 '13 at 18:21
  • 1
    Thanks for the reference., By accessible I meant something available online, sorry. I don't have easy access to good libraries.... – Joe King Apr 28 '13 at 18:23
  • Hmmm... I don't recall coming across anything, but I'll see what I can find. – Jeremy Miles Apr 29 '13 at 02:27