10

Several posts (here and here) suggest that beta regression is more appropriate when the dependent variable is naturally bounded between 0 and 1. My question is, leaving appropriateness aside, is it technically incorrect to fit a logistic regression to proportional response variable? R will throw a warning but still produce a result.

It seems to me that the likelihood function will not be a valid likelihood when the response variable is proportional instead of binary, but mathematically speaking, it can still be minimized to give a solution. I wonder what violation/mistake, if any, is made when fitting a logistic regression to proportional data.

  • 1
    In addition to the answers below: [Here](http://stats.stackexchange.com/questions/26762/how-to-do-logistic-regression-in-r-when-outcome-is-fractional) is another post dealing with this question. – COOLSerdash Jun 29 '13 at 09:35

2 Answers2

11

What you propose is sometimes called a fractional logit. It certainly has its merits, as long as you remember to use robust standard errors. In 2010 I gave a talk at the German Stata Users' meeting comparing among other things beta regression and fractional logit. The slides can be found here: http://www.maartenbuis.nl/presentations/berlin10.pdf

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
Maarten Buis
  • 19,189
  • 29
  • 59
  • (+1) Maarten, another question: I read that binomial GLM can be used for fraction/proportion responses if the total number of trials is provided for each fraction/proportion (in R this is done with a `weights` argument to `glm`), see e.g. here http://stats.stackexchange.com/a/26779/28666. How does "fractional logit" with "robust standard errors" relate to this approach? Is it the same thing or not? – amoeba Sep 05 '16 at 11:40
  • 2
    @amoeba it is different. Think of a fractional logit as a model for the mean proportion, while what you proposed as a way to recover a logit model. – Maarten Buis Sep 05 '16 at 17:30
6

Models of this kind are often defined and used as one kind of generalized linear model. For one concise review, see http://www.stata-journal.com/article.html?article=st0147 The argument is that the binomial is a reasonable family even for continuous proportions as the variance will also approach 0 as the mean approaches either 0 or 1.

Whether particular programs or functions in particular software accommodate them is a different matter. To say that "R will throw a warning but still produce a result" conveys little information. Which package are you referring to? Is it really the only relevant package? In any case, as the article just referenced indicates, this model is well supported in Stata, for example.

That still leaves scope for detailed discussion of the relative merits of a logit model for continuous proportions and beta regression.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
  • 2
    +1 on this old answer after today's discussion elsewhere. I would still encourage you to post an answer about this approach in http://stats.stackexchange.com/questions/29038. – amoeba Feb 15 '17 at 12:10
  • 2
    Some comments about how this works in R are e.g. in the comments under this answer http://stats.stackexchange.com/a/43369 in a related thread. – amoeba Feb 15 '17 at 12:11