1

I first found this really nice Stata video on fractional regression (the dependent variable is a proportion including 0 and 1). I am especially interested in how he applies the margin approach to measure the effect of a dummy variable (2:32 - 2:58).

I also found this really nice blog on fractional regression, and its counterparts in R, which shows that fractional regression is simply a glm with family quasibinomial. So far, so great!

Now the only thing that this blog does not explain, is how to apply the margins function from Stata, to the R approaches listed. I am especially interested in the quasibinomial.

I found this margins package for R, but it does not really mention either dummies, factors, ordinal or categorical variables. If someone could show me how to utilise this package, to get the marginal effect of a dummy/ordinal variable that would be amazing.

Tom
  • 209
  • 4
  • 17
  • What exactly do you want to find out using marginal effects? The `emmeans` package is popular and quite comprehensive to calculate and compare marginal effects. But without more specific information, it's difficult to say more. – COOLSerdash Apr 15 '21 at 08:02
  • @COOLSerdash Thank you for your comment. I am not super sure how to make it clearer than the video fragment. Essentially, I am just looking for a clear way to explain the interpretation of a dummy variable or ordinal variable in a `glm` regression. I am going through the `margins` package at the moment, and I think maybe it can be achieved by evaluating the dummy at 1 and 0 with `at` and similarly for an ordinal variable. Maybe I was a little bit too quick with posting this question. But if I figure it out, I will add an answer myself. If I cannot, I will check `emmeans` as well. – Tom Apr 15 '21 at 08:11

2 Answers2

1

You can use the margins package to evaluate the effect of a dummy or ordinal variable.

Dummy variable

library(haven) # foreign::read.dta does not work for some reason.
d <- read_dta("http://www.stata-press.com/data/r16/401k.dta")

# code from https://m-clark.github.io/posts/2019-08-20-fractional-regression/
model_quasi = glm(
  prate ~ mrate + ltotemp + age + sole,
  data = d,
  family = quasibinomial
)
(m <- margins(model_quasi))
summary(m)

  factor     AME     SE        z      p   lower   upper
     age  0.0044 0.0004  12.2908 0.0000  0.0037  0.0051
 ltotemp -0.0264 0.0017 -15.5742 0.0000 -0.0297 -0.0230
   mrate  0.1473 0.0086  17.0834 0.0000  0.1304  0.1642
    sole  0.0211 0.0061   3.4310 0.0006  0.0090  0.0331

If we now compare the coefficient of the R output to the coefficient of the Stata output, we see that it is almost identical:

enter image description here

Stata code:

use https://www.stata-press.com/data/r16/401k
fracreg logit prate mrate c.ltotemp c.age i.sole
margins r.sole

Ordinal variable

d$cutage <- cut(d$age, c(0,10,20,30, 100), right=FALSE)
model_quasi = glm(
    prate ~ mrate + ltotemp + cutage + sole,
    data = d,
    family = quasibinomial
)
(m <- margins(model_quasi))
summary(m)

         factor     AME     SE        z      p   lower   upper
  cutage[10,20)  0.0582 0.0068   8.5954 0.0000  0.0449  0.0715
  cutage[20,30)  0.0922 0.0082  11.2253 0.0000  0.0761  0.1083
 cutage[30,100)  0.0861 0.0103   8.3495 0.0000  0.0659  0.1064
        ltotemp -0.0258 0.0017 -15.3155 0.0000 -0.0291 -0.0225
          mrate  0.1490 0.0086  17.2847 0.0000  0.1321  0.1658
           sole  0.0206 0.0061   3.3488 0.0008  0.0085  0.0326

In Stata:

egen cutage = cut(age), at(0,10,20,30,100)
fracreg logit prate mrate c.ltotemp i.cutage i.sole

margin r.cutage

Contrasts of predictive margins                 Number of obs     =      4,075
Model VCE    : Robust

Expression   : Conditional mean of prate, predict()

------------------------------------------------
             |         df        chi2     P>chi2
-------------+----------------------------------
      cutage |
  (10 vs 0)  |          1       85.73     0.0000
  (20 vs 0)  |          1      163.36     0.0000
  (30 vs 0)  |          1       69.09     0.0000
      Joint  |          3      226.53     0.0000
------------------------------------------------

--------------------------------------------------------------
             |            Delta-method
             |   Contrast   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
      cutage |
  (10 vs 0)  |    .058218   .0062878      .0458941    .0705418
  (20 vs 0)  |   .0922323   .0072163      .0780886     .106376
  (30 vs 0)  |   .0861435   .0103634      .0658316    .1064554
--------------------------------------------------------------
Tom
  • 209
  • 4
  • 17
1

To illustrate how to calculate the marginal effect of a categorical variable, I'll use the sprogram data that is mentioned in the video from Stata. Here is the output from Stata including the marginal effect of summer:

fracreg logit prate i.summer pdonations freemeals

------------------------------------------------------------------------------
             |               Robust
       prate |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      summer |
        yes  |    .579492   .0517463    11.20   0.000     .4780711     .680913
  pdonations |   .0415092   .0108759     3.82   0.000     .0201928    .0628257
   freemeals |  -.5200508   .0912252    -5.70   0.000    -.6988489   -.3412526
       _cons |   1.203022   .0719935    16.71   0.000     1.061917    1.344126
------------------------------------------------------------------------------

margins r.summer

--------------------------------------------------------------
             |            Delta-method
             |   Contrast   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
      summer |
(yes vs no)  |   .0930492   .0089097      .0755864    .1105119
--------------------------------------------------------------

Here is the replication in R:

library(margins)
library(sandwich)
library(lmtest)

dat <- read.csv("sprogram.csv", sep=";")
# The model
mod <- glm(prate~summer + freemeals + pdonations, data = dat, family = quasibinomial())

To replicate Stata's standard errors (almost), we need to use robust standard errors:

print(coeftest(mod, vcov = vcovHC(mod, type = "HC1")), digits = 6)

              Estimate Std. Error  z value   Pr(>|z|)    
(Intercept)  1.2030216  0.0721018 16.68504 < 2.22e-16 ***
summeryes    0.5794920  0.0518241 11.18190 < 2.22e-16 ***
freemeals   -0.5200508  0.0913621 -5.69219 1.2542e-08 ***
pdonations   0.0415092  0.0108923  3.81089 0.00013847 ***

Finally, we use margins together with robust standard errors for the marginal effect of summer:

res <- margins(mod, variable = "summer", type = "response", vcov = vcovHC(mod, type = "HC1"))

summary(res)

    factor    AME     SE       z      p  lower  upper
 summeryes 0.0930 0.0089 10.4488 0.0000 0.0756 0.1105

The results are virutally identical.

COOLSerdash
  • 25,317
  • 8
  • 73
  • 123