0

Using stata (with weighted survey design) I ran the following, where logwage is the log of wage. The log was taken because wage was not normally distributed. There is also information about the workers' demographics such as racial/ethnic, gender, previously held education, and whether or not they participated in a voluntary training (binary variable yes = 1, no = 0).

svy: etregress logwage i.race gender, treat(training = i.education gender) 

--------------------------------------------------------------------------------------------------
                                 |             Linearized
                                 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------------------+----------------------------------------------------------------
logwage                          |
                            race |
                African American |   .3891554   .0031105    12.20   0.000     .2000000    .8474752
                 Asian American  |   .1487310   .0002843    04.11   0.000     .027113     .8765290
                                 |
                          gender |
                         female  |  -.0230411    .010445    -6.85   0.000    -.115341   -.0107295
                                 |


                      1.training |   .3703371   .0451778    10.61   0.000     .2018037    .4186134
  

  ---------------------------------+----------------------------------------------------------------
    training                         |
                         i.education |
                         Highschool  |  -.0715731   .0490565     1.28   0.098    -.1106579    .1291781
                            College  |   .1271380   .0401052     3.95   0.003     .0329516    .2107563
                        Grad School  |   .8522143   .0085337     8.99   0.000     .8271381    .9573284
                                     |
                              gender |
                             female  |   .0127444   .0100058     5.33   0.041     .0100558    .0866312
                               _cons |  -1.260083   .0327235   -26.12   0.000    -1.531405   -1.098524
    ---------------------------------+----------------------------------------------------------------


                             /athrho |   .0051552    .031410     0.17   0.827    -.0722533    .0810246
                            /lnsigma |  -1.872551   .0166818   -73.50   0.000    -1.928624   -1.278064
    ---------------------------------+----------------------------------------------------------------
                                 rho |   .0084120   .0421116                     -.0649947    .0888529
                               sigma |   .4000831   .0038170                      .1925127    .5067780
                              lambda |   .0012673   .0226365                     -.0324029 

after this, margins calculated (as directed by Stata's marginal analysis page here)

margins

Predictive margins


Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   4.810383   .0072197   666.28   0.000      4.79622    4.824546
------------------------------------------------------------------------------

and

margins i.gender 

Predictive margins

Expression   : Linear prediction, predict()

--------------------------------------------------------------------------------------------
                           |            Delta-method
                           |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------------+----------------------------------------------------------------
                    Gender |
                   Female  |   4.305098   .0097962   439.47   0.000     4.285881    4.324314
                     Male  |   4.523071   .0077528   583.41   0.000     4.507863     4.53828

What is this 4.305098 in relation to the logwage? It says on average female what is 4.305098?

iPlexipen
  • 211
  • 2
  • 7

1 Answers1

1

The 4.30 number is the expected log wage made as if everyone was a woman (setting female to 1 for everyone in your data).

Let me show you how to calculate the marginal effect of a binary variable on $y$ from an etregress model of $\ln(y)$.

First we fit the same toy model from your previous question:

. webuse nhanes2f, clear

. qui svyset psuid [pweight=finalwgt], strata(stratid)

. qui svy: etregress loglead i.female i.diabetes, treat(diabetes = weight age height i.female) // coefl

Then we calculate the effect of female on lead in percent:

. nlcom pct_eff:(100*(exp(_b[loglead:1.female])-1))

     pct_eff:  (100*(exp(_b[loglead:1.female])-1))

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     pct_eff |  -30.64646   .7382346   -41.51   0.000    -32.09337   -29.19955
------------------------------------------------------------------------------

This says that, on average, women have 30% less lead in their blood. This calculation is explained here. Note that we used the female coefficient in the formula. You can use the coefl option to see how Stata names them.

Now we can calculate what that means in units of lead, rather than in percent. Here lead is measured in micrograms per deciliter. A microgram is one-millionth of a gram. A decilitre measures fluid volume and is 1/10 of a litre. The model implies women have 5 fewer micrograms of lead per deciliter of blood than men:

. margins, expression(exp(predict(xb))*exp((exp(_b[/:lnsigma])^2)/2)*_b[loglead:1.female])

Predictive margins

Number of strata   =        31                 Number of obs     =       4,940
Number of PSUs     =        62                 Population size   =  56,316,764
Model VCE    : Linearized                      Design df         =          31

Expression   : exp(predict(xb))*exp((exp(_b[/:lnsigma])^2)/2)*_b[loglead:1.female]

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |  -5.267688   .1543202   -34.13   0.000    -5.582426    -4.95295
------------------------------------------------------------------------------

Since Stata thinks in terms of $\ln(\sigma)$, I had to add an extra exponentiation step to get back $\sigma$. The intuition for the formula is at the top of the answer here.

How did I calculate this? Let's start by predicting lead as if everyone was a man and then as if everyone was a woman, keeping other covariates as they are:

. margins, expression(exp(predict(xb))*exp((exp(_b[/:lnsigma])^2)/2)) at(female = (0 1)) post

Predictive margins

Number of strata   =        31                 Number of obs     =       4,940
Number of PSUs     =        62                 Population size   =  56,316,764
Model VCE    : Linearized                      Design df         =          31

Expression   : exp(predict(xb))*exp((exp(_b[/:lnsigma])^2)/2)

1._at        : female          =           0

2._at        : female          =           1

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |   17.10827    .288379    59.33   0.000     16.52012    17.69642
          2  |   11.86519    .240167    49.40   0.000     11.37537    12.35501
------------------------------------------------------------------------------

This says that men are expected to have 17 micrograms per dL and women are expected to have ~12 micrograms per dL, on average. This is similar to what you did, but for lead (wages) rather than log of lead (log of wages).

We can now calculate the difference between these two counterfactuals using contrast:

. contrast r._at

Contrasts of predictive margins

Number of strata   =        31                 Number of obs     =       4,940
Number of PSUs     =        62                 Population size   =  56,316,764
Model VCE    : Linearized                      Design df         =          31

Expression   : exp(predict(xb))*exp((exp(_b[/:lnsigma])^2)/2)

1._at        : female          =           0

2._at        : female          =           1

------------------------------------------------
             |         df           F        P>F
-------------+----------------------------------
         _at |          1     1195.29     0.0000
      Design |         31
------------------------------------------------
Note: F statistics are adjusted for the survey
      design.

--------------------------------------------------------------
             |            Delta-method
             |   Contrast   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
         _at |
   (2 vs 1)  |  -5.243079   .1516523     -5.552376   -4.933782
--------------------------------------------------------------

The difference between 17 and 12 is ~5. This matches what we had above directly.

If we put that in percentage terms, that is $100 \cdot (-5.243079)/17.10827=-30.646459$. Note that this matches the first calculation we made. It uses the male average since we are interested in the effect of female going from 0 to 1.

If you want to do that in one step, you can do this:

. qui svy: etregress loglead i.female i.diabetes, treat(diabetes = weight age height i.female) // coefl

. margins r.female, expression(exp(predict(xb))*exp((exp(_b[/:lnsigma])^2)/2)) 

Contrasts of predictive margins

Number of strata   =        31                 Number of obs     =       4,940
Number of PSUs     =        62                 Population size   =  56,316,764
Model VCE    : Linearized                      Design df         =          31

Expression   : exp(predict(xb))*exp((exp(_b[/:lnsigma])^2)/2)

------------------------------------------------
             |         df           F        P>F
-------------+----------------------------------
      female |          1     1195.29     0.0000
      Design |         31
------------------------------------------------
Note: F statistics are adjusted for the survey
      design.

--------------------------------------------------------------
             |            Delta-method
             |   Contrast   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
      female |
   (1 vs 0)  |  -5.243079   .1516523     -5.552376   -4.933782
--------------------------------------------------------------

Here I had to fit the model again because I overwrote the estimation results when I used the post option so that I can use contrast on the predictions.


Code:

webuse nhanes2f, clear
qui svyset psuid [pweight=finalwgt], strata(stratid)
qui svy: etregress loglead i.female i.diabetes, treat(diabetes = weight age height i.female) // coefl
nlcom pct_eff:(100*(exp(_b[loglead:1.female])-1))
margins, expression(exp(predict(xb))*exp((exp(_b[/:lnsigma])^2)/2)*_b[loglead:1.female])
margins, expression(exp(predict(xb))*exp((exp(_b[/:lnsigma])^2)/2)) at(female = (0 1)) post
contrast r._at
qui svy: etregress loglead i.female i.diabetes, treat(diabetes = weight age height i.female) // coefl
margins r.female, expression(exp(predict(xb))*exp((exp(_b[/:lnsigma])^2)/2))
Nick Cox
  • 48,377
  • 8
  • 110
  • 156
dimitriy
  • 31,081
  • 5
  • 63
  • 138
  • Thank you very much for this explanation. It took a day for me to fully understand it. My question here is only, what is the sigma value we use in this equation? – iPlexipen Jul 30 '20 at 06:22
  • It’s the standard deviation of the error term in the outcome equation. – dimitriy Jul 30 '20 at 06:23