1

Let's consider the following dataframe:

d = data.frame(Sex  =factor(rep(c("Male","Female"),times=2), levels=c("Male","Female")),
           Race =factor(rep(c("White","Black"),each=2),  levels=c("White","Black")),
           y    =c(1, 3, 5, 7),
           weights = c(0.01,0.03,0.02,0.01))
> d
    Sex  Race  y weights
1   Male White 1 0.01
2 Female White 3 0.03
3   Male Black 5 0.02
4 Female Black 7 0.01

Let's assume that the weights are inverse propensity scores. If we calculate the weighted means for Sex categories, we get this:

       Sex  y
mean(Male): 3.66
mean(Female): 4
#i.e., the difference is 0.34

However, if we fit a glm on the above data and plugged weights in, we obtain the following coefficient estimates:

Call:
glm(formula = y ~ Sex + Race, data = d, weights = w)

Deviance Residuals: 
1  2  3  4  
0  0  0  0  

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
(Intercept)        1          0     Inf   <2e-16 ***
SexFemale          2          0     Inf   <2e-16 ***
RaceBlack          4          0     Inf   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0)

    Null deviance: 0.22857  on 3  degrees of freedom
Residual deviance: 0.00000  on 1  degrees of freedom
AIC: -Inf

Number of Fisher Scoring iterations: 1

in which, the estimated coefficient for SexFemale is 2. Note that if we exclude the weights = weights we still obtain the same coefficients but with different estimates. Now I'm wondering why the two mean difference differs? What can I say about the mean difference in this situation? Should I based my evaluation on glm estimates or what?

msmazh
  • 147
  • 7
  • Your computation of weighted means does not control for race. Your GLM does. What does it look like if you just estimate glm(formula = y ~ Sex, data = d, weights = w)? – The Laconic Mar 25 '19 at 00:30
  • @TheLaconic Thanks. Will post the result of the model you requested shortly. However, if the reason is not controlling for race, why the coefficient and means are the same when we don't have weights? pls see this post: https://stats.stackexchange.com/questions/120030/interpretation-of-betas-when-there-are-multiple-categorical-variables – msmazh Mar 25 '19 at 01:58
  • Coincidence. That model fits your made-up data perfectly, whether you weight it or not. – The Laconic Mar 25 '19 at 11:48

0 Answers0