What is the difference between logistic regression and Fractional response regression?

Question

As far as I know, the difference between logistic model and fractional response model (frm) is that the dependent variable (Y) in which frm is [0,1], but logistic is {0, 1}. Further, frm uses the quasi-likelihood estimator to determine its parameters.

Normally, we can use glm to obtain the logistic models by glm(y ~ x1+x2, data = dat, family = binomial(logit)).

For frm, we change family = binomial(logit) to family = quasibinomial(logit).

I noticed we can also use family = binomial(logit) to obtain frm's parameter since it gives the same estimated values. See the following example

    library(foreign)
    mydata <- read.dta("k401.dta")

    glm.bin <- glm(prate ~ mrate + age + sole + totemp, 
                   data = mydata, family = binomial('logit'))
    summary(glm.bin)

return:

    Call:
    glm(formula = prate ~ mrate + age + sole + totemp, 
        family = binomial("logit"), 
        data = mydata)
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -3.1214  -0.1979   0.2059   0.4486   0.9146  
    
    Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
    (Intercept)  1.074e+00  8.869e-02  12.110  < 2e-16 ***
    mrate        5.734e-01  9.011e-02   6.364 1.97e-10 ***
    age          3.089e-02  5.832e-03   5.297 1.17e-07 ***
    sole         3.636e-01  9.491e-02   3.831 0.000128 ***
    totemp      -5.780e-06  2.207e-06  -2.619 0.008814 ** 
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for binomial family taken to be 1)
    
        Null deviance: 1166.6  on 4733  degrees of freedom
    Residual deviance: 1023.7  on 4729  degrees of freedom
    AIC: 1997.6
    
    Number of Fisher Scoring iterations: 6

And for family = quasibinomial('logit'):

    glm.quasi <- glm(prate ~ mrate + age + sole + totemp, 
     data = mydata
    ,family = quasibinomial('logit'))
    summary(glm.quasi)

return:

    Call:
    glm(formula = prate ~ mrate + age + sole + totemp, 
        family = quasibinomial("logit"), 
        data = mydata)
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -3.1214  -0.1979   0.2059   0.4486   0.9146  
    
    Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  1.074e+00  4.788e-02  22.435  < 2e-16 ***
    mrate        5.734e-01  4.864e-02  11.789  < 2e-16 ***
    age          3.089e-02  3.148e-03   9.814  < 2e-16 ***
    sole         3.636e-01  5.123e-02   7.097 1.46e-12 ***
    totemp      -5.780e-06  1.191e-06  -4.852 1.26e-06 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for quasibinomial family taken to be 0.2913876)
    
        Null deviance: 1166.6  on 4733  degrees of freedom
    Residual deviance: 1023.7  on 4729  degrees of freedom
    AIC: NA
    
    Number of Fisher Scoring iterations: 6

The estimated Beta from both family are the same, but the difference is the SE values. However, to obtain the correct SE, we have to use library(sandwich) as in this post.

Now, my questions:

What is the difference between these two codes?
Is frm about to obtain robust SE?

If my understanding is not correct, please give some suggestions.

coffeinjunky · Accepted Answer · 2016-06-06T14:54:48.297

If your question is: what is the difference between these two codes?

A look at ?glm says See family for details of family functions, and a look at ?family reveals the following description:

The quasibinomial and quasipoisson families differ from the binomial and poisson families only in that the dispersion parameter is not fixed at one, so they can model over-dispersion.

This is also what you see in your output. And that is the difference between both models / codes.

If your question is: what is the difference between the logistic regression and the fractional response regression?

As you correctly identify, the model is a logistic one if your dependent variables are either 0 or 1. Papke and Wooldridge have shown that you can use a GLM of this form for fractions as well for the estimation of the parameters, but you need to compute robust standard errors. This is not required for the logistic regression, and in fact, some people think you should not compute robust standard errors in probit/logit models. Though this is a different debate.

The theoretical basis comes from a famous paper by Gourieroux, Monfort, and Trognon in Econometrica in 1984. They show that (under some regularity conditions etc) maximum likelihood parameters obtained by maximizing a likelihood that belongs to the linear exponential family are consistent estimates for parameters belonging to any other likelihood in the linear exponential family. So, in some sense, we are using the logistic distribution here even though it is not exactly the correct one, but the parameters are still consistent for the parameters that we wish to obtain. So, if your question originates from the observation that we are using the very same likelihood function to estimate both logistic and fractional response models, except that we exchange the nature of the dependent variable, then this is the intuition.

how can we measure frm performance? Can we use MSE like linear regression? — newbie, Jun 07 '16 at 08:55
That is a very different question. Please post it as a new one. — coffeinjunky, Jun 07 '16 at 08:56

What is the difference between logistic regression and Fractional response regression?

1 Answers1

Linked

Related